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Preface 


Scope 


Track and vertex reconstruction is an important part of the data analysis chain in 
experiments at particle accelerators. This includes fixed-target experiments, experi- 
ments at lepton and hadron colliders, and neutrino experiments where the detector 
is the target. This book deals almost exclusively with the reconstruction of charged 
particles and their production vertices in tracking detectors. The reconstruction 
of neutral particles in calorimeters and particle identification are subjects that are 
outside the scope of the present book; excellent treatments of these topics can be 
found elsewhere. 

The methods presented here are also largely agnostic to the detector technology 
that produces the observations that are the input to the algorithmic chain of track 
finding — track fitting — vertex finding — vertex fitting. The problems that arise 
in producing hit positions with well-calibrated standard errors in the various types of 
tracking detectors are therefore described only very briefly; otherwise, it is simply 
assumed that such positions with correct standard errors are available. Nevertheless, 
an important part of the book deals with methods not sensitive to deviations from 
this ideal case. 

Although track and vertex reconstruction can be formulated largely in geometri- 
cal terms, a statistically sound treatment of the stochastic processes that disturb the 
trajectories of charged particles by interactions with the detector material is manda- 
tory. The modeling of these processes and their incorporation into the reconstruction 
algorithms is therefore a significant topic that is treated in considerable detail. 

The examples in the last part of the book come from the four LHC experiments, 
complemented by two experiments at SuperKEKB and FAIR. Nonetheless, it was 
our aim to present the algorithmic solutions in as general a context as possible. 
Inevitably, some selection of the material in the extensive literature was necessary, 
driven by the available space and potentially somewhat influenced by our own 
experiences and predilections. Still, we aspired to describe in sufficient detail as 
many important contributions as possible; wherever this was not feasible, there are 
pointers to the relevant publications. 
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Content 


The first part of the book is conceived as an introduction. Chapter 1 gives a very brief 
outline of tracking detectors, their basic principles of operation, and the challenges 
they pose for calibration and alignment. Chapter 2 shows how track and vertex 
reconstruction are embedded in the entire event reconstruction chain, from the 
trigger to the physics object reconstruction, and how important they are for the final 
physics analysis. Chapter 3 introduces basic notions from applied mathematics and 
statistics relevant for the core topics of the book: function minimization, regression 
and state space models, and clustering. 

Part II covers the first core topic, track reconstruction. Chapter 4 describes track 
models, starting with the equations of motion of charged particles, followed by 
describing different ways of parametrizing the state of a particle and exhibiting 
algorithms for track and error propagation in various types of magnetic fields. 
It concludes with modeling of material effects and their inclusion in the track 
reconstruction. Chapter 5 is dedicated to track finding. As there is no systematic 
theory of track finding yet, we describe a variety of algorithms that have been and are 
presently successfully deployed in many experiments, including fast track finding 
in real time at the trigger level. Chapter 6 presents established methods for the 
estimation of track parameters, both traditional least-squares estimators and more 
recent robust and adaptive estimators. The special cases of circle and helix fitting 
are given a separate section as is the assessment of track quality. The detection of 
outliers and the finding of kinks in tracks concludes the chapter and Part II. 

Part III is dedicated to the second core topic, vertex reconstruction. Chapter 7 
first introduces the distinction between primary and secondary vertices, then goes 
on to discuss search strategies for finding primary vertices, both in one dimension 
and in 3D space. Various clustering algorithms are presented. Chapter 8 showcases 
methods for vertex fitting, which are very similar to the ones used for track fitting on 
a mathematical level. The concluding section shows how to extend the vertex fit to 
a kinematic fit by imposing additional geometric constraints and conservation laws. 
Part III concludes with Chap. 9, which deals with the reconstruction of secondary 
vertices. As the methodology of the search for secondary vertices is strongly 
influenced by the location of the vertex and the properties of the emerging particles, 
the four most important types are treated in four separate sections. 

Part IV of the book presents case studies of approaches to tracking and vertexing 
from current and future experiments. Given our background in two of the LHC 
experiments and the enormous challenges in all of the LHC experiments, it is maybe 
not surprising that the four of them are our prime examples. They are complemented 
by two experiments not at the LHC, Belle II, and CBM. They have to solve their own 
specific tracking and vertexing problems, somewhat different from the ones typical 
for the LHC, but not less difficult all the same. We have to warn the reader, however, 
that at least some of these examples come with an expiration date as it were. In 2019, 
the LHC experiments have already started preparations for Run 3 of the LHC, which 
is scheduled to start in 2021, and for the high-luminosity phase of the LHC or HL- 
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LHC, planned for operation starting in 2026. It is to be expected that the conditions 
at the HL-LHC require substantial changes in the reconstruction algorithms of 
ATLAS and CMS, especially in the track finding part. Belle II has started operation 
recently and will certainly adapt and optimize its current algorithms with the rise 
of the luminosity of the SuperKEKB B factory. CBM was still in the preparation 
phase in 2019, and although it has found very convincing solutions to its tracking 
and vertexing challenges, it too will profit from experience and probably modify its 
approach if need arises. The reader should therefore keep in mind that the examples 
in Part IV are based on the published material at the time of writing, and that many 
exciting developments are yet to come or have already found their way into the track 
and vertex reconstruction software. 

The appendices following Part IV contain supplementary material. Appendix A 
lists the Jacobian matrices of the parameter transformations treated in Chap. 4; 
Appendix B shows the regularization of the kinematic fit for singular covariance 
matrices; and Appendix C contains a list of software packages for track and vertex 
fitting, as well as entire frameworks with existing, but easily replaceable, modules 
for track and vertex reconstruction. These frameworks can serve as convenient 
testbeds for new ideas and algorithms, addressing users and developers alike. 


Audience 


The book is intended for a wide audience: PhD students who want to gain better 
understanding of the inner workings of track and vertex reconstruction; PhD 
students and postdocs who want to enrich their experience by participating in 
projects that require knowledge of the topic; and researchers of all ages who want 
to contribute to the progress of the field by becoming algorithm and software 
developers. In all cases, the reader is expected to have some basic knowledge of 
linear algebra and statistics. 
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A Note on the References 


WWW addresses (URLs) are given for web resources, technical reports, articles in 
arXiv and open access journals, and material that is hard to find otherwise. 


Typesetting and Notation 


Vectors are typeset in small bold italic letters, for example, a. Unless specified 
otherwise, all vectors are column vectors. The length of vector a is denoted by 
dim(a), its transpose by a', its norm by |a|. The scalar product of two vectors a and 
b is denoted by a - b, and their cross product by a x b. Matrices are typeset in capital 
bold italic letters, for example, A. The rank of matrix A is denoted by rank(A), 
its diagonal by diag(A), its inverse by AT!, and its transpose by AT. The block- 
diagonal matrix with blocks A1, ..., An is denoted by A = blkdiag(A1, ..., An). 
The gradient of a multivariate function F(x) is a row vector and denoted by V F; 
the Hessian matrix is denoted by V?F. The expectation of a random variable z is 
denoted by E [z] and its variance by var [z]. The expectation of a random vector 
e is denoted by E [e] and its covariance matrix by Var [e]. The cross-covariance 
matrix of two random vectors e and ö is denoted by Cov [e, 6]. As usual, ôij is the 
Kronecker delta. 
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Part I 
Introduction 


Chapter 1 A 
Tracking Detectors m 


Abstract The chapter gives an overview of particle detectors, with the emphasis 
on tracking detectors. The working principles and the calibration of gaseous, 
semiconductor, and fiber detectors are explained, followed by a brief review of 
detector alignment. As an illustration, the tracking systems of the four experiments 
at the LHC and two non-LHC experiments, Belle II and CBM, are presented. 


1.1 Introduction 


Many high-energy physics experiments are performed by colliding two beams of 
high energy particles or one beam with a target. The particles produced in a collision 
are recorded by particle detectors. The collision is then studied by reconstructing 
most of the particles produced in the interaction and determining their properties. 
For a general review of particle detectors, see for instance [1, Chap. 7] and [2]. 

The present book concentrates on the reconstruction of charged tracks and 
interaction vertices using information collected by tracking detectors. A tracking 
detector can be a single device such as a wire chamber or a silicon strip sensor, or 
a full-blown tracking system such as a time projection chamber capable of stand- 
alone track reconstruction; see Sect. 1.6. For a review of non-tracking detectors, the 
reader is referred to [1, 3]. 

A charged particle crossing a tracking detector generates a single or a string of 
spatial observations in the local coordinate system of the detector. For track recon- 
struction, these have to be transformed to points in 3D space usually with different 
precisions in the three coordinates. Accordingly, the correct transformations are 
determined by the alignment procedure; see Sect. 1.5. The resulting space points or 
"hits" are collected into track candidates by the track finding. In the subsequent track 
fit, the track parameters are estimated and the track hypothesis is tested. Successfully 
reconstructed tracks are then clustered into production or decay vertices, followed 
by a vertex fit and a test of the vertex hypothesis. 
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4 ] Tracking Detectors 


There are three principal types of tracking detectors: gas-filled or gaseous 
tracking detectors; solid state tracking detectors, usually equipped with silicon 
sensors; and scintillating fiber trackers. They are described in the following three 
sections. 


1.2 Gaseous Tracking Detectors 


Gaseous tracking detectors utilize the ionizing effect of charged particles in a 
volume of gas. The simplest gaseous detector is the Geiger-Miiller counter, a tube 
with a central wire. A potential difference of 1-3 kV between the tube wall and the 
wire causes the primary electrons and ions to move towards the anode (wire) and 
the cathode (tube wall), respectively. Because of the large field strength close to the 
wire, the primary electrons generate an avalanche of secondary electrons and ions, 
resulting in a detectable signal fed into an amplifier. 

In order to use this principle for the measurement of the position of a charged 
particle, a structured array of many elements has to be designed. The following 
subsections describe a few common types of such arrays. 


1.2.1 Multi-wire Proportional Chamber 


The multi-wire proportional chamber (MWPC, [1, Sect. 7.1]) is a thin cuboid 
volume of gas, oriented approximately perpendicular to the passage of the particles 
to be measured. The volume is bounded by a pair of conductive plates acting as the 
cathode. Inside the volume an array of anode wires detects the passage of charged 
particles, see Fig. 1.1. With a typical wire spacing of 1 mm a spatial resolution 
of around 0.3 mm can be achieved in the v-coordinate orthogonal to the wires. A 
further improvement can be obtained by tilting the MWPC so that the probability of 
a particle giving signals on two adjacent wires gets larger or by rotating the chamber 
and measuring the drift times to the anode wires and estimating the point of passage 
from the drift distances [4, 5]. The resolution in the w-coordinate along the wire is 
poor, equal to the wire length divided by 4/12 ~ 3.46. 


1.2.2 Planar Drift Chamber 


A planar drift chamber is similar in shape to an MWPC (see Fig. 1.1), but the 
electrical field is shaped by an alternating array of sense (anode) and field (cathode) 
wires; see [1, Sect. 7.2] and [6]. The electrons and ions from the primary ionization 
drift to the respective electrodes, and gas amplification in the strong field close to 
the sense wires gives a detectable signal that is amplified. In addition, the drift time 
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wires or strips 


cv 


Fig. 1.1 Local coordinate system in a wire chamber, in a planar drift chamber, or in a silicon strip 
sensor 


between the crossing time of the particle and the signal time on the sense wire is 
measured. The drift distance can then be computed from the drift time. Given a 
precise calibration of the drift distance, a spatial resolution below 0.1 mm in the 
coordinate v orthogonal to the wire can be achieved; see for instance [7]. 

Drift chambers need far fewer electronic channels than MWPCs, but have to be 
monitored and calibrated much more carefully. In addition, there is an inherent left- 
right ambiguity of the spatial position relative to the sense wire, so that each hit has a 
mirror hit. This ambiguity must be resolved in the track finding or track fitting stage. 
The spatial position w along the wire can be measured by comparing the charges 
at both ends of the wire. A resolution of a couple of millimeters can be achieved in 
this way [8]. 


1.2.3 Cylindrical Drift Chamber 


Cylindrical drift chambers [1, Sects. 7.2 and 7.3] have been and still are widely 
used in collider experiments. Such a chamber consists of up to 60 cylindrical 
layers of alternating field and sense wires mostly parallel to the beams of the 
collider; see Fig. 1.2. In the local coordinate system of the chamber, the z-axis is the 
symmetry axis of the chamber. The measured point in the transverse plane can be 
given in polar coordinates (Rm,®m,Zm), where Rm is the radial position of the sense 
wire, and ®, is the polar angle of the wire plus/minus the drift angle ®4, i.e., the 
drift distance d divided by Rm. The resolution of the drift distance is typically in 
the order of 0.1-0.2 mm; see for instance [9, 10]. Like in a planar drift chamber, 
each hit has a mirror hit, and the ambiguity must be resolved in the track finding 
or track fitting stage. The z-coordinate can be measured by charge division or by 
adding "stereo" layers of wires tilted with respect to the “axial” layers. The resulting 
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Fig. 1.2 Local coordinate 
system in a cylindrical drift 
chamber. Only one layer of 
sense wires is shown. Rm is 
the radial distance of the 
sense wire from the z axis; 
@,, is the azimuth angle of 
the sense wire. d is the drift 
distance; q = d/ Ry is the 
angle spanned by d at radius 
Rm. The azimuth angles of 
the track hit and its mirror hit 
are Py | = Py — Da, 

Py = Py + a. The 
z-coordinate may or may not 
be measured directly 
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spatial resolution of zm is equal to the drift distance resolution divided by the sine 
of the stereo angle, typically 2-3 mm. 


1.2.4 Drift Tubes 


Drift tubes are small drift chambers with a single sense wire. They can have 
rectangular or circular cross sections, and can be arranged in cylindrical or planar 
layers. In planar layers, wires can run in two or more directions, giving good spatial 
resolution in two orthogonal directions. The resolution of the drift distance is in the 
order of 0.1-0.2 mm. 


1.2.5 Time Projection Chamber 


The typical time projection chamber (TPC), as employed in many collider experi- 
ments, is a large gas-filled volume shaped as a hollow cylinder, the axis of which is 
aligned with the beams and the magnetic field [1, Sects. 7.3.3], see Fig. 1.3. There 
is a potential difference between the central cathode plane and the two anode end 
plates. The latter are equipped with position sensors. A charged particle traversing 
the chamber ionizes the gas, and the electrons travel along the field lines towards 
the end plates, where both the point and the time of arrival are measured and 
recorded. A track therefore generates a dense string of space points. The position 
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Fig. 1.3 Local coordinate o 
system in a time projection o 
chamber. The position in the 
endplate can be given in polar 
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sensors at the end plates can be wire chambers or micro-pattern gas detectors. In the 
drift direction z a spatial resolution in the order of 1 mm is possible; the transverse 
resolution depends on the technology of the endplate sensors and increases with the 
drift distance because of diffusion. Resolutions well below 0.1 mm can be achieved 
with GEM chambers on the end plates. Another important calibration issue is the 
correction of distortions arising from space charge effects, see for instance [11]. 


1.2.6 Micro-pattern Gas Detectors 


The earliest micro-pattern gas detector is the micro-strip gas detector, in which wires 
are replaced by microscopic metal strip structures deposited on high-resistivity 
substrates [1, Sect. 7.4]. With typical strip distances of 75 um, a spatial resolution 
below 20 um can be achieved. Other developments are the Gas Electron Multiplier 
(GEM) and Micro-Mesh Gaseous Structure (Micromegas) chambers. Resolutions 
down to 10 um can be attained by these devices. Due to their small size and fast 
collection of positive ions, they can be operated at high rates up to several MHz per 
mm”. 


1.3 Semiconductor Tracking Detectors 


Semiconductor tracking detectors [1, Sect. 7.5] are mostly made of thin silicon 
wafers, approximately 0.3mm thick. The n-type silicon is processed by photo- 
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lithographic methods to create p*-doped implants on one side, either thin strips 
(see Sect. 1.3.1) or small pixels (see Sect. 1.3.2). Each strip or pixel is connected 
to a read-out channel. The silicon bulk is fully depleted by a bias voltage. A 
charged particle crossing the wafer creates electron-hole pairs along its path. The 
electrons and the holes drift towards the electrodes and induce signals on the read- 
out electrodes. Silicon drift detectors employ the measurement of the electron drift 
time for measuring the position of a crossing particle; see Sect. 1.3.3. 


1.3.1 Silicon Strip Sensors 


Large-area semiconductor tracking systems employ silicon strip sensors to keep 
construction costs affordable. The implants in the silicon wafers are narrow strips 
with a typical width of 20 um and a typical inter-strip distance of 100 um. The local 
coordinate system is shown in Fig. 1.1. The spatial resolution in v, orthogonal to the 
strip direction, depends on the track direction via the cluster size, i.e., the number of 
adjacent strips with a signal above threshold, and on the Lorentz angle [12]. Under 
optimal conditions the resolution is in the order of 10 um or better. The resolution 
in w, parallel to the strip direction, is equal to the strip length divided by 4/12. A 
typical strip length is 5 cm; shorter strips with a length below a centimeter are called 
mini-strips. For an example of the calibration procedure, see [13]. 

Two-dimensional (2D) measurements can be achieved by implanting strips on 
the back side of the wafer, either orthogonal to the ones on the front side, or at 
a different stereo angle. Alternatively, two one-sided sensors can be glued on the 
same mechanical support, separated by a small gap. If such a sensor happens to be 
crossed by n particles at the same time, up to n strips on each side can give a signal, 
resulting in up to n? points of intersection. At most n of these correspond to the true 
particle positions; the remaining ones are spurious or ghost hits. 


1.3.2 Hybrid Pixel Sensors 


In a pixel sensor, the implants are small square or rectangular pixels in a high- 
resistivity silicon wafer. The pixels are connected to the read-out channel by bump 
bonding. Depending on the pixel size and the track direction, pixel sensors can have 
a position resolution below 10 um in both coordinates, especially if the signal height 
is measured and used to interpolate between pixels in a cluster. For an interesting 
example of the calibration procedure, see [14]. Here, the position is estimated 
from precomputed cluster templates, considering the incident angle and the Lorentz 
angle. The templates can also be used to decide whether an observed cluster is 
compatible with a predicted incident angle. 

Pixel sensors are mostly employed in the region close to the interaction point, 
as they can deal with the high track density and the high background radiation 
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Fig. 1.4 Three-dimensional sketch of a multi-anode silicon drift detector. The trajectories of 
several signal electrons are indicated. The distance between the point where the ionizing particle 
crosses the middle plane of the detector and the array of anodes is obtained from measurement of 
electrons drift time. The second coordinate is given by the location of the anode pad where the 
signal electrons are collected. An improvement of resolution along this coordinate comes from 
charge sharing among several anodes. (From [16], by permission of Elsevier) 


better than strip detectors. In addition, their excellent spatial resolution allows the 
separation of secondary decay vertices very close to the primary interaction vertex; 
see Chap. 9. 

In monolithic pixel sensors, the sensitive volume and part of or the full read-out 
circuits are combined in one piece of silicon. The generated charge is collected on 
a dedicated collection electrode so that there is no need for delicate and expensive 
bump bonding. For an application of monolithic pixel sensors in a vertex detector, 
see Sect. 1.6.1.1. 


1.3.3 Silicon Drift Sensors 


In a silicon drift sensor, electrons are transported parallel to the surfaces of the 
sensor to an anode segmented into small pads [15, 16]. The position information 
along the drift direction is obtained from a measurement of the drift time of the 
electrons. The position in the second coordinate is obtained from charge sharing 
between adjacent pads; see Fig. 1.4. Silicon drift sensors have been deployed in 
STAR and ALICE, see [16, 17]. 


14 Scintillating Fiber Trackers 


Scintillating fiber trackers combine the speed and efficiency of plastic scintillators 
with the geometric flexibility and hermeticity provided by fiber technology [18]. 
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The fibers in such a tracking detector serve two functions: they convert the ionisation 
energy deposited by a passing charged particle into optical photons, and guide the 
optical signal to the devices that detect the generated light. In recent applications, 
these devices are silicon photomultipliers, which are fast, compact and sensitive to 
single photons [19]. 

The spatial resolution of a fiber is approximately equal to the diameter divided by 
„12, while the number of photons scales linearly with the diameter. The conflicting 
requirements on accuracy and light yield can be alleviated by staggering the fibers 
and by choosing a material with high intrinsic scintillation yield and long attenuation 
length. For an example of a large-scale scintillating fiber tracker designed for 
operation in the LHCb experiment from Run 3 of the LHC onward, see [20]. 


1.5 Alignment 


The position measurements in the tracking detectors mentioned above are generated 
in the local coordinate system of the devices. In order to be useful for track 
reconstruction, they have to be converted to positions in the global coordinate 
system of the experiment along with the associated covariance matrices. As tracking 
detectors are very precise instruments, with position resolutions ranging from a 
couple of hundred micrometers down to about ten micrometers, their positions, 
orientations, and possible deformations have to be known with a similar or better 
precision. The importance of correct alignment, especially in the complex detectors 
of the LHC era, is attested by a series of workshops held at CERN in the past [21— 
23]. 

Misalignment or insufficient alignment has a deleterious effect on the efficiency 
of track and vertex reconstruction [21, p. 105]. Random misalignment also degrades 
the resolution of track and vertex parameters and subsequently of invariant masses. 
Moreover, systematic misalignment of larger substructures can cause a bias in the 
estimates of track momenta and vertex positions. This can be harmful in many of 
the physics analyses of the experiment. 

Misalignment can have several sources: finite precision of the detector assembly, 
thermal and magnetic stresses on mechanical structures, sagging of wires or sensors 
because of gravity, changes in temperature and humidity, etc. Since misalignment 
can and does vary over time, constant monitoring is a necessity. 

Alignment proceeds through several steps. The starting point is the ideal 
geometry, augmented by knowledge of the machining and assembly precision. The 
next step is alignment by hardware using lasers for measuring distances or proximity 
and tilt sensors. For instance, the ATLAS silicon tracker can be monitored optically 
by Frequency Scanning Interferometry to a precision of about 10 um [24]. 

The final step is track-based alignment, either with tracks from cosmic muons, 
or from collisions, or both. Actually alignment profits from different types of tracks 
that cross different parts of the detector under different angles. For instance, tracks 
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from collisions hardly ever cross the entire central tracker of a collider experiment, 
but cosmic tracks do. 

Track-based alignment can be split into internal alignment and external align- 
ment. Internal alignment refers to the relative alignment of a tracking system, 
whereas external alignment refers to the alignment of the various tracking systems 
to the global frame of the experiment, which is usually tied to the beam pipe or 
some other part of the accelerator infrastructure. Even the internal alignment of a 
tracking system can be a big challenge. For instance, the current silicon tracker of 
the CMS detector has more than 10* sensors to be aligned, each with six degrees 
of freedom, not counting deformations of the sensors under gravity or thermal 
stresses. The estimation of O(10°) parameters is a highly non-trivial problem. A 
solution that has become a de-facto standard is the experiment-independent program 
Millepedell [25, 26], which performs a simultaneous fit of (global) alignment 
parameters and (local) track parameters, allowing to include laser and survey data as 
well as equality constraints in the fit. For one of the alternative algorithms developed 
for track-based alignment, see [27]. 


1.6 Tracking Systems 


Track reconstruction requires a minimal number of space points per track. A 
tracking system is a device that has enough information for stand-alone track recon- 
struction. A typical collider experiment has three tracking systems for momentum 
measurement: the vertex detector, the central tracker, and the muon tracking system. 
Fixed-target experiments frequently have vertex detectors as well, complemented by 
a magnetic spectrometer for momentum measurement. 

The vertex detector is the tracking system closest to the beam, with the purpose 
to give very precise position and direction information of the tracks produced in a 
collision, so that decays very close to the interaction point can be detected with large 
efficiency; see Chap. 9. It therefore has the largest precision (smallest measurement 
errors) of all tracking systems. Vertex detectors are usually equipped with pixel 
sensors in order to achieve the required precision. 

The central or inner tracker of a collider experiment is positioned between 
the vertex detector and the calorimeters. It is normally embedded in a solenoidal 
magnetic field with high bending power. A silicon tracker typically produces O(10) 
hits per track, while a TPC produces O(100) hits per track. The main requirements, 
not always easily satisfied, on the central tracker are: high single-hit precision; 
good capability to resolve two nearby tracks; precise momentum estimation by a 
long lever arm (large diameter); enough redundancy for high-quality track finding; 
hermetic coverage; as little material as possible. In some cases, especially at the 
future high-luminosity LHC (HL-LHC), fast readout is also essential, as trackers 
must be able to contribute to the trigger. 

The muon system is situated behind the calorimeters which, in principle, absorb 
all particles with the exception of muons. Additional iron filters can be employed 
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as well. The muon system can provide an independent measurement of the muon 
momentum, especially for the purpose of triggering; see Sect.2.1. If the muon 
system has to cover a large area, as is the case in the LHC experiments, it is typically 
equipped with proportional chambers, drift chambers or drift tubes. 

The following subsections briefly describe the tracking systems of experiments 
at the LHC, the SuperKEKB B -factory, and the future Facility for Antiproton and 
Ion Research (FAIR). 


1.6.1 Detectors at the LHC 
1.6.1.1 ALICE 


ALICE is a dedicated heavy-ion experiment at the LHC [28]. Its detector is designed 
to study the physics of strongly interacting matter at extreme energy densities. 

Until the end of 2018, the Inner Tracking System (ITS, see Fig. 1.5) of the 
ALICE detector consisted of two barrel pixel layers [29], two layers of silicon 
drift detectors [17], and two layers of double-sided silicon strip detectors. After the 
upgrade in 2019-2020, the ITS consists of seven layers equipped with monolithic 
pixel chips [30]. 


1. ITS 6. EMCAL 
2. VO and TO 7. PHOS 
3. TPG 8. Muons 


4. TRD 9. AD 
5. TOF 


Fig. 1.5 Cut-away view of the ALICE detector. (From https://arxiv.org/abs/1812.08036. ©2015 
CERN for the benefit of the ALICE Collaboration. Reproduced under License CC-BY-4.0) 
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The main tracking device of ALICE is a TPC [31, 32]. It provides up to 159 space 
points per track. The measurement of the energy deposit due to ionization provides 
a powerful tool for particle identification, especially for low-momentum particles, 
see [33]. For track reconstruction in the TPC and global track reconstruction, 
see Sect. 10.1. The TPC is surrounded by a transition radiation detector (TRD) used 
for triggering and electron identification. 

The ALICE muon spectrometer covers only the forward region of the experiment 
and is dedicated to the study of quarkonia production, open heavy flavor production 
and vector meson properties via the muonic decay channel [34]. 


1.6.1.2 ATLAS 


ATLAS [35] is one of the two general-purpose experiments at the LHC, the other 
one being CMS. Its vertex detector (see Fig. 1.6) originally consisted of three barrel 
pixel layers and three end-cap pixel disks on either side [36]. In 2014, a fourth pixel 
layer was inserted in the barrel between the existing pixel detectors and a new beam 
pipe with smaller radius [37]. 

The central tracker of ATLAS [36] consists of two parts: the silicon tracker (SCT) 
with four barrel layers and nine end-cap disks on either side, and the Transition 
Radiation Tracker (TRT) made of “straw tubes” which are proportional counters 
that contribute to particle identification via transition radiation; see Sect. 2.4.1. 
A charged particle hits at least 30 straw tubes on the way through the TRT; 
see Sect. 10.2. 

The ATLAS muon spectrometer [35, Chap. 6] consists of a barrel part and two 
end-caps. The barrel spectrometer contains three concentric layers, each with eight 
large and eight small chambers of drift tubes. Each end-cap has four disks of drift 
tube chambers and cathode strip chambers. Resistive plate chambers on the barrel 
and thin gap chambers in the end-caps are used for trigger purposes. 


1.6.1.3 CMS 


CMS [38] is, besides ATLAS, the second general-purpose experiment at the LHC. 
Its vertex detector originally consisted of three barrel pixel layers and two end-cap 
pixel disks on either side [39]. In the winter of 2016/2017, this device was upgraded 
with a fourth barrel layer and a third end-cap disk on either side, giving at least four 
hits per track over the full solid angle covered by the detector [40]. 

The silicon strip tracker (SST) of CMS is the largest silicon tracker ever built. It is 
divided into four sections: the inner barrel (TIB), the outer barrel (TOB) and the two 
end-caps (TEC). Depending on its angle with respect to the beam axis, a charged 
particle crosses between eight and 14 sensors, out of which four to six are double- 
sided ones [41]. Track reconstruction in the SST is done mostly in conjunction with 
the Pixel Detector; see Sect. 10.3. 
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Fig. 1.6 Top: Cut-away view of the ATLAS detector. (From [35], reproduced under License CC- 


BY-3.0). Bottom: The central tracker. (From https://collaborationatlasfrance.web.cern.ch/content/ 
tracker) 


The CMS muon system [42] consists of four layers of muon stations inserted in 
the iron return yoke of the solenoid; see Fig. 1.7. The stations in the barrel region 
are equipped with drift tubes, and those in the end-caps are equipped with cathode 
strip chambers. In addition, resistive plate chambers are mounted in both the barrel 
and end-caps of CMS; they are used mainly for triggering. 
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Fig. 1.7 Top: Schematic diagram of a sector of the barrel part of the CMS detector. (From [43], 
reproduced under License CC-BY-4.0). Bottom: Schematic view of the vertex detector and the 
silicon strip tracker. (Courtesy of W. Adam) 


Figure 1.7 shows a schematic diagram of a sector of the barrel part of the CMS 
detector. 


1.6.1.4 LHCb 


LHCb [44] is the experiment at the LHC that is dedicated to precision measurements 
of CP violation and rare decays of B hadrons. Instead of surrounding the entire 
collision point with an enclosed detector as ATLAS and CMS, the LHCb experiment 
is designed to detect mainly particles in the forward direction. 
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Fig. 1.8 Top: View of the LHCb detector. (From [44], reproduced under License CC-B Y-3.0). 
Bottom: Sensor of the Vertex Locator (VELO) 


The core of the LHCb [44] tracking system (see Fig. 1.8) is a silicon microstrip 
detector close to the interaction point, the Vertex Locator (VELO). It can be moved 
to a distance of only 7 mm from the proton beams and measures the position of 
the primary vertices and the impact parameters of the track with extremely high 
precision. 

Up to end of 2018, the tracking downstream of the VELO was accomplished 
by the TT and the T stations. The Tracker Turicensis (TT) is a silicon microstrip 
detector placed upstream of the dipole magnet, which improves the momentum 
resolution of reconstructed tracks and reject pairs of tracks that in reality belong 
to the same particle. The magnet is placed behind the TT. It bends the flight path 
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of the particles in the x-z plane and therefore allows the determination of their 
momenta. The tracking system is completed by the T stations (T1-T2-T3), which, 
together with the information from the VELO and optionally the TT, determine the 
momentum and flight direction of the particles. The T stations are composed of 
silicon microstrip sensors close to the beam pipe and by straw tubes in the outer 
regions. For track reconstruction in LHCb, see Sect. 10.4. 

After the upgrade of LHCb during 2019—2020 and starting with Run 3 of the 
LHC, the tracking downstream of the VELO is done by the SciFi, a homogeneous 
tracking system in scintillating fiber technology; see [20] and Sect. 1.4. 

The LHCb muon system [44, Sect. 6.3], consisting of the muon stations MI 
to M5, provides fast information for the muon trigger at Level 0 and muon 
identification for the high-level trigger and offline analysis. It comprises five stations 
interleaved with absorbers. The stations are mostly equipped with multi-wire 
proportional chambers, with the exception of the central part of the first chamber, 
where GEM detectors are used because of the high particle rate. 


1.6.2 Belle Il and CBM 
1.6.2.1 Belle II 


Belle II [45] is an experiment at the SuperKEKB collider at KEK in Japan. Its 
principal aim is the study of the properties of B mesons. The detector is shown 
in Fig. 1.9. The vertex detector consists of two parts, the pixel detector (PXD) with 
two layers of DEPFET pixels [46] and the Silicon Vertex Detector (SVD) with four 
layers of double-sided silicon strip detectors [47]. 

The central tracking device of Belle II is the CDC, a cylindrical drift cham- 
ber [48]. It has 56 layers of sense wires in nine superlayers, five with a total of 
32 axial wire layers and four with 24 stereo wire layers. The stereo angle is between 
2.6 and 4.2 degrees. For track reconstruction in Belle II, see Sect. 11.1. 

The Belle II KLM system is designed to detect long-lived K -mesons and muons. 
It consists of alternating layers of iron plates, serving as flux return, and active 
detector elements. In the end-caps and the innermost two layers of the barrel, the 
active elements are scintillator strips; the rest of the barrel layers are equipped with 
resistive plate chambers, reused from Belle [49]. 


1.6.2.2 CBM 


The Compressed Baryon Matter (CBM) experiment is a fixed target experiment [50] 
(see Fig. 1.10) at the future FAIR facility for antiproton and ion research. It is 
designed to investigate the properties of highly compressed baryon matter. Its 
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Fig. 1.9 Top: View of the Belle II detector. Bottom: The pixel detector and the silicon vertex 
detector 


central tracking device is the Silicon Tracking System (STS [51]). It is designed 
for high multiplicity, up to 1000 charged particles per interaction, at high rates, up 
to 10 MHz, and consists of eight layers of double-sided silicon sensors between 30 
and 100 cm downstream of the target, inside the magnetic field. 

Each of the three Transition Radiation Detectors [52] is made of four MWPCs. 
Their main task is electron identification. Track reconstruction in the STS and the 
TRD is described in Sect. 11.2. 
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Fig. 1.10 Top: Schematic geometry of the CBM detector. (From [53]). Bottom: Schematic 
geometry of the silicon tracking system. (From [51]) 
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Chapter 2 A 
Event Reconstruction Cheek for 


Abstract The chapter gives an outline of the event reconstruction chain of a 
typical large experiment, from the trigger to the physics object reconstruction. 
The concept of the trigger is illustrated by two examples, CMS and LHCb, 
followed by a discussion of track reconstruction and its stages: hit generation, 
local reconstruction, and global reconstruction. The section on vertex reconstruction 
introduces a classification of vertices and sets the scene for the dedicated chapters 
on vertex finding and vertex fitting. The chapter concludes with some remarks 
on particle identification and reconstruction of physics objects such as electrons, 
muons, photons, jets, x leptons, and missing energy. 


2.1 Trigger and Data Acquisition 


2.1.1 General Remarks 


In the latest and most powerful collider, the Large Hadron Collider (LHC), bunches 
of protons collide with a nominal frequency of 40 MHz, i.e., every 25 ns. Due to 
gaps in the beams, at most 2556 bunches are actually present [1]. As the bunch 
revolution frequency equals 11,245 Hz, the average bunch crossing rate is about 
28.7 MHz. The average number of individual proton collisions per bunch crossing, 
called the pile-up, depends on the luminosity. In the CMS experiment, it varied 
between about five and slightly more than 60 in 2018 [2], so that bunch crossings 
without collisions are extremely unlikely in CMS and likewise in ATLAS. The 
result of a bunch crossing with at least one collision is called an event. In CMS 
and ATLAS, the event rate is virtually the same as the bunch crossing rate. In the 
LHCb experiment, however, not all events are actually visible in the detector, so that 
the effective event rate is lower than the bunch crossing rate; see Sect. 2.1.3. 

The most frequent value of pile-up observed by CMS in 2018 was around 30, 
corresponding to nearly 900 million individual collisions per second. Clearly, it 
is neither possible nor desirable for the experiment to record all of the event data 
produced during data taking. A selection mechanism is therefore required that tags 
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the physically interesting events and activates the recording mechanism, called data 
acquisition (DAQ). Such a selection mechanism is called a trigger. 

Triggers have been deployed for many decades, ever since bubble chambers 
were superseded by experiments with electronic tracking detectors and calorimeters. 
Triggers are needed both in fixed target and in colliding beam experiments because 
of limitations in data rates, storage capacity and computing resources. In order to 
deal with the high event rates typical for electronic experiments, it is important to 
minimize the dead time of the trigger, i.e., the time after an event during which 
the system is not able to process another event. Triggers are therefore implemented 
in several stages or levels, with increasing computational complexity and decision 
time or latency. The principle and possible implementations are best demonstrated 
on examples. In the following the trigger/DAQ systems of the CMS and the LHCb 
experiments will be described in somewhat more detail. 


2.1.2 The CMS Trigger System 


CMS [3] is one of the two general purpose experiments at the LHC; see Sect. 1.6.1.3. 
Its trigger has two levels, the Level-1 Trigger (L1, [4]) and the High-Level Trigger 
(ALT, [5, 6]). During data taking virtually every bunch crossing results in at least one 
collision of protons, and the average primary event rate equals the average bunch 
crossing rate of 28.7 MHz. The trigger, however, must be able to process events 
separated by only 25 ns. 

The task of the L1 trigger is to reduce the primary event rate to less than 100 KHz. 
This is achieved using high-speed customized hardware running up to 128 different 
algorithms. Its inputs come from the calorimeters, the muon detectors, and the 
beam monitors. Its latency is 3.2 us, including data collection, decision making, and 
propagation back to the detector front-ends. A block diagram of the L1 trigger is 
shown in Fig. 2.1. 

The input to the muon trigger (MT) comes from three different detector types. 
The local MTs find track segments, the regional MTs find tracks, and the global 
MT combines the information from all regional MTs, selects the best four muon 
candidates, and provides their momenta and directions. 

The calorimeter trigger (CT) uses information from the both the electromagnetic 
and the hadronic calorimeter. The local CTs compute energy deposits, the regional 
CTs find candidates for electrons, photons, jets, isolated hadrons, and compute 
transverse energy sums. The transverse energy vector Er is defined as: 


Er=E. (ace J l (2.1) 


sin $ cos A 


where ¢ is the azimuth angle and à = 7/2 — 0 is the dip angle of the particle or 
jet direction. The global CT sorts the candidates in all categories, computes total 
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Fig. 2.1 Block diagram of the CMS L1 trigger. The details are explained in the text. (From [4], by 
permission of Elsevier) 


and missing transverse energy sums, and computes jet multiplicities for different 
thresholds. 

Finally, the global trigger makes the final decision which is passed on to the 
detector front-end electronics and the DAQ. 

The HLT is designed to reduce the L1 output rate of 100 KHz to the final recording 
rate of O (100) Hz. It is implemented purely in software which runs on a farm of 
commercial processors, using the full event data and performing the reconstruction 
and selection of physics objects such as electrons, photons, muons, t leptons, 
hadronic jets, and missing energy; see Sect. 2.4. 


2.1.3 The LHCb Trigger System 


Although the intersection point of LHCb is tuned to lower luminosity than the one 
of CMS and ATLAS, the number of produced bb pairs is still of the order of 10!! 
per year. An efficient, robust and flexible trigger is required in order to cope with 
the harsh hadronic environment. It must be sensitive to many different final states. 
While the average bunch crossing rate is 28.7 MHz, the frequency of events 
actually visible in the detector is about 13 MHz [7]. This must be reduced to about 
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Fig. 2.2 Scheme of the LHCb Trigger Run 2 
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1 MHz, the frequency at which the detector can be read out. As in CMS, the trigger 
is organized in two levels, Level-0 (LO) and HLT (see Fig. 2.2). 

The LO trigger is implemented in field-programmable gate arrays (FPGAs) with 
a fixed latency of Aus [8]. It has two components: the calorimeter trigger, which 
looks for particles (electrons, photons, neutral pions, hadrons) with large transverse 
energies; and the muon trigger which uses the tracks reconstructed in the muon 
chambers and selects the two muons with the highest transverse momentum for 
each quadrant of the muon detector. 

The HLT is implemented purely in software running on the nodes of the event 
filter farm [8]. It is subdivided in two stages, HLT1 and HLT2; see Sect. 10.4. HLT1 
does a partial event reconstruction. Its output, at a rate of about 100 kHz, is directed 
to HLT2, which performs the full event reconstruction and writes the results to mass 
storage. For more details on track and vertex reconstruction, see Sect. 10.4. 


2.2 Track Reconstruction 


Track reconstruction is a central task in the analysis of the event data. Its aim is to 
provide estimates of the track parameters, i.e., the position, the direction, and the 
momentum of charged particles at one or several specific points or surfaces. For an 
excellent review of track reconstruction, see [9]. 

Only the tracks of stable or sufficiently long-lived charged particles—for instance 
electrons, protons, muons and charged pions—are visible in the tracking detectors. 
Short-lived particles, for instance B hadrons or J/y mesons, are reconstructed from 
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their decay products. The reconstruction of charged particles can be divided into the 
following steps: 


A. Hit generation The recorded signals are converted to spatial coordinates, either 
2D or 3D, using the detector-specific calibration constants. The coordinates will 
be called “measurements”, “observations” or “hits” in the following. For some 
examples, see Sects. 1.2 and 1.3. 


B. Local track reconstruction First, tracks are reconstructed in each tracking 
system. This can again be divided into two or three steps. 


1. Track segment reconstruction. This step is relevant only for tracking systems that 
are in turn composed of several independent devices capable of giving at least the 
position and the direction of the particle and possibly the momentum. A typical 
example is the barrel muon system of the CMS experiment which consists of 
four muon stations, each with up to 12 layers of drift tubes. Although a muon 
station is not considered a fully-fledged tracking system in the context of track 
reconstruction, it provides enough hits to estimate the parameters of a straight 
line track segment [10]. 

2. Track finding. In this step, hits or track segments are clustered to track candidates, 
i.e., collections of hits, such that ideally all hits or segments in a collection are 
produced by the same particle. Track finding can be done iteratively, especially 
in the very-high multiplicity events recorded by the LHC experiments. In this 
approach, “easy” tracks with high momentum and small material effects are 
found first, while more difficult tracks are found in the subsequent passes 
(see Sect. 10.3). Algorithms for track finding are discussed in detail in Chap.5. 
Specific solutions by experiments are discussed in Chaps. 10 and 11. 

3. Track fitting. For each track candidate, the track model is fitted to the hits in order 
to get the best estimates of the track parameters. This is the topic of Chap. 6. The 
track fit also gives an indication of the quality of the fit, usually in the form of 
a chi-square statistic x”. An abnormally large value of the chi-square statistic 
indicates either a random combination of hits, i.e., a “fake” or “ghost” track, or 
the presence of outliers in the track candidate. Outliers can either be removed 
from the track or down-weighted by employing a robust estimator; see Sects. 6.2 
and 6.4.2. 


C. Global track reconstruction After the local track reconstruction, the tracks 
found in the individual tracking systems have to be combined to global track candi- 
dates. To this end, the track candidates accepted by the track fit in the main tracking 
system are extrapolated to the other tracking systems and checked for compatibility 
with the tracks reconstructed there. As the estimated track parameters in different 
tracking systems are stochastically independent, the test for compatibility is usually 
based on the chi-square statistic of their weighted mean, but more sophisticated 
machine learning methods can be applied as well [11]. The successful combination 
of local tracks to a global candidate is normally followed by a track fit of the latter. 

Some tracking systems do not have enough redundancy to allow stand-alone local 
track reconstruction, for instance a pixel vertex detector with only two layers. In this 
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case, the tracks found in the global track reconstruction are extrapolated and used to 
find compatible hits, which are then attached to the track; see Sect. 5.1.7. 


D. Assessment of track quality Not every track candidate generated by the track 
finding is a valid track. Testing the track hypothesis and assessing the track quality 
after the track fit is therefore mandatory. This is the topic of Sect. 6.4. 

Given a careful calibration of the tracking devices (Sects. 1.2 and 1.3) the chi- 
square statistic of the track fit is used to reject fake tracks (Sect. 6.4.1) and to 
trigger the search for outliers (Sect. 6.4.2) or kinks (Sect. 6.4.3). Another important 
ingredient to the assessment of track quality is the track length or the effective 
number of hits attached to the track after removal or down-weighting of outliers; 
see Sect. 6.4.1. Checking the compatibility with the collision or production region 
can also be used to reject fake tracks. 

If the track reconstruction proceeds iteratively, it is tempting to remove the hits 
used in an iteration from the hit pool to simplify the task of the subsequent iterations. 
It is, however, advisable to remove only those hits that are unambiguously attached 
to tracks of the highest quality. 


2.3 Vertex Reconstruction 


A point where particles are produced in a collision or a decay is called a vertex. 
The point of collision of two beam particles in a collider or of a beam particle 
with a target particle in a fixed-target experiment is called the primary vertex. In 
high-luminosity colliders, such as the LHC, many collisions occur in a single bunch 
crossing; consequently, there are many primary vertices. It is, however, statistically 
almost certain that at most one of the collisions generates a pattern recognized by the 
trigger as being of potential physical interest. The vertex of this collision is called 
the signal vertex. 

Many of the particles produced at a primary vertex, including the signal vertex, 
are unstable and decay at a secondary vertex. The aim of of vertex reconstruction is 
to find sets of particles that have been produced at the same vertex, to estimate the 
vertex position, test whether the assignment of the particles to the vertex is correct, 
and improve the estimates of the track parameters by imposing the vertex constraint. 
Vertex finding and vertex fitting are the topics of Chaps. 7 and 8, respectively. For 
the various types of secondary vertices and how to find them, see Chap. 9. Examples 
of experimental strategies are given in Chaps. 10 and 11. 


2.4 Physics Objects Reconstruction 


Both the trigger and the physics analysis require not just tracks, but objects that 
represent physical entities such as electrons, photons, muons, t leptons, jets, 
missing energy, etc. Object identification can be obtained by two complementary 
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approaches: dedicated detectors for particle identification (PID), and combining 
information from different sub-detectors. 


2.4.1 Particle ID by Dedicated Detectors 


Charged particles can be identified by dedicated detectors in various ways [12]. 


Measurement of the velocity Given the momentum as determined by the tracking 
system, the mass can be estimated. Velocity can be measured directly by time-of- 
flight detectors [13], or indirectly by measuring the emission angle of Cherenkov 
radiation in Cherenkov detectors [14] and time-of-propagation detectors [15]. 


Energy loss by ionization In a large range of velocity, the expected energy loss 
by ionization is proportional to (m/p)*, where m is the unknown mass and p is the 
momentum of the particle; see Sect. 4.5.2.1. In practice, the most probable energy 
loss is estimated from a number of measurements. In a silicon tracker, energy loss is 
measured in each sensor [16]; in a drift or time projection chamber, the energy loss 
is measured for each wire hit or for each cluster in the endplates, respectively [17]. 


Transition radiation Transition radiation (TR) is electromagnetic radiation in the 
X-ray band. It is emitted when an ultra-relativistic particle crosses the boundary 
between two media with different dielectric constants. The radiator is combined 
with a gaseous sensing device that measures the TR signal and the position of the 
particle [13, 18]. 


2.4.2 Particle and Object ID by Tracking and Calorimetry 


PID in dedicated detectors is complemented by combining information from the 
tracking systems and the calorimeters. 


Electrons Electrons and positrons are identified as such by the fact that they have 
a reconstructed track and a cluster in the electromagnetic calorimeter that matches 
the track in energy and position. For the special treatment of electrons in the track 
fitting, see Sects. 6.2.3, 10.2 and 10.3. 


Photons Clusters in the electromagnetic calorimeter that are not matched to a track 
or a cluster in the hadronic calorimeters are candidates for photons. 


Muons Global tracks with hits in both the central tracking system and the muon 
tracking system are candidates for muons. 


Jets Jets are narrow bundles of charged and neutral particles produced by the 
hadronization of a quark or a gluon. Jet reconstruction algorithms are based on 
clustering the charged tracks, but should also provide a good correspondence 
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between the energy deposits in the calorimeters and the reconstructed tracks. This is 
the aim of the particle flow method, which originated in the ALEPH experiment at 
the LEP collider [19], and is now employed by LHC experiments as well [20, 21]. 
In this approach, the energy of a charged hadron is estimated by a weighted average 
of the track momentum and the associated calorimeter energy. The energy of the 
photons is measured by the electromagnetic calorimeter, and the energy of neutral 
hadrons is measured by the hadronic calorimeter. 


Tau leptons Tau leptons have to be reconstructed from their decay products 
(Sect. 9.2). In two thirds of the cases, t leptons decay into hadrons, typically into 
one or three charged mesons (predominantly pions), often accompanied by neutral 
pions decaying into photons and an invisible neutrino. Therefore, the particle flow 
approach can be applied to t leptons as well [22, 23]. 


Missing energy Missing transverse energy is a signature for invisible particles such 
as neutrinos, dark matter, and neutral supersymmetric particles. In a typical collider 
experiment, it is a global quantity computed from the transverse momentum/energy 
components of all jets, electrons, photons, muons, and r leptons in the event. In the 
LHC experiments, it has to be corrected for contributions from the pile-up collisions 
in the same bunch crossing; see for instance [24]. 
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Chapter 3 N 
Statistics and Numerical Methods Geek for 


Abstract The chapter gives an outline of some statistical and numerical methods 
that will be applied in later chapters. The first section deals with the minimization of 
functions. Several gradient-based methods and a popular non-gradient method are 
discussed. The following section discusses statistical models and the estimation of 
model parameters. The basics of linear and nonlinear regression models and state 
space models are presented, including least-squares estimation and the (extended) 
Kalman filter. The final section gives a brief overview of clustering and different 
types of clustering algorithms. 


3.1 Function Minimization 


The minimization (or maximization) of a multivariate function F(x) is a frequent 
task in solving non-linear systems of equations, clustering, maximum -likelihood 
estimation, function and model fitting, supervised learning, and similar problems. 
A basic classification of minimization methods distinguishes between methods that 
require the computation of the gradient or even the Hessian matrix of the function 
and gradient-free methods. All methods discussed in the following subsections 
are iterative and require a suitable starting point xo. Implementations of the 
methods discussed in the following, along with many others, can, for instance, be 
found in the MATLAB® Optimization Toolbox™ [1] and in the Python package 
scipy.optimize [2]. 


3.1.1 Newton-Raphson Method 


If F(x) is at least twice continuously differentiable in its domain, it can be 
approximated by its second order Taylor expansion F at the starting point xo: 
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^ 1 
F(xo + h) © F(x) = F(xo) + g(xo) h + 2 h' H (xo) h, (3.1) 


where g(x) = VF (x) is the gradient and H (x) = V?F(x) is the Hessian matrix of 
F(x). The step h is determined such that F has a stationary point at x1 = xo + h, 
i.e., that the gradient of F is zero at x: 


VF (x1) = go) + hA A (x0) = 0 = h = -[H (xo) ! g(xo)'. (3.2) 


Note that if x is a column vector, g(xọ) is a row vector. In order to ensure that the 
Wolfe conditions [3] are satisfied, Eq. (3.2) is often relaxed to: 


h = —5|H (xo)! g (x0), (3.3) 


with a learning rate 7 € (0, 1). Inverting the Hessian matrix can be computationally 
expensive; in this case, h can be computed by finding an approximate solution to 
the linear system H (xo) h = — g(xo)!. This procedure is iterated to produce a 
sequence of values according to: 


Xii = Xk — HH (xD) ga)". (3.4) 


If the starting point xo is sufficiently close to a local minimum, the sequence 
converges quadratically to the local minimum. In practice, the iteration is stopped 
as soon as the norm | g (xx) | of the gradient falls below some predefined bound e. If 
F is a convex function, the local minimum is also the global minimum. 


3.1.2 Descent Methods 


As the computation of the Hessian matrix is computationally costly, various 
methods that do not require it have been devised, for instance, descent methods. A 
descent method is an iterative algorithm that searches for an approximate minimum 
of F by decreasing the value of F in every iteration. The iteration has the form 
Xk+1 = Xk comp dy, where dy is the search direction and nx is the step-size 
parameter. As with the Newton-Raphson method, when a (local) minimum is 
reached, it cannot be left anymore. 


3.1.2.1 Line Search 


A search direction d is called a descent direction at the point x € IR" if g(x)-d < 0. 
If n is sufficiently small, then 


F(x+nd) < F(x). (3.5) 
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Once a search direction d; has been chosen at the point xx, line search implies that 
the line x = xg + Ad, is followed to its closest local minimum. For various line 
search methods such as fixed and variable step size, interpolation, golden section or 
Fibonacci's method, see [4, 5]. 


3.1.2.2 Steepest Descent 


Steepest descent follows the opposite direction of the gradient. If it is combined 
with line search, each search direction is orthogonal to the previous one, leading 
to a zig-zag search path. This can be very inefficient in the vicinity of a minimum 
where the Hessian matrix has a large condition number (ratio of the largest to the 
smallest eigenvalue); see Fig. 3.1. 


3.1.2.3 Quasi-Newton Methods 


In the Newton-Raphson method, the search direction is d = —[H (KW! gan)". 
If H(xx) is positive definite, then d is a descent direction, and so is — A g(x)! 
for any positive definite matrix A. In a quasi-Newton method, A is constructed as 
an approximation to the inverse Hessian matrix, using only gradient information. 
The initial search direction is the negative gradient, dy = — g(xo)l, and the initial 
matrix Ao is the identity matrix. Each iteration performs a line search along the 
current search direction: 


F(x,y) = 22? + 16y? 


2 T m Sa 
==: | x Minimum H 


mn 


Fig. 3.1 Contour lines of the function F(x, y) = 2x?--16y?, and steepest descent with line search 
starting at the point xo = (3; 1) 


36 3 Statistics and Numerical Methods 
Àk = arg, mn (xk +AdK), Xk+1 = Xk + Akdy. (3.6) 


The new search direction is then computed according to: 


dig = -Artıg(&k41) , (3.7) 


where A44, is the updated approximation to the inverse Hessian matrix. There are 
two different algorithms for computing the update [6, p. 422], both using the change 
of the gradient along the step, denoted by y, = [g(xx41) — gap]. 


Davidon-Fletcher-Powell algorithm 


dyd] Ax yyy} Ak 


Ak+1 = Ar + A (3.8) 
dj y, YAK 
Broyden-Fletcher-Goldfarb-Shanno algorithm 
did] diy} di 
Ani AAT N 4%). (3.9) 
d, yx d, yi Yk 


3.1.24 Conjugate Gradients 


If the function F(x), x € R” is quadratic of the form F(x) = 5x Ax —b'x +c 
with positive definite A, the global minimum can be found in exactly n steps, if line 
search with a set of conjugate search directions is used. Such a set {dı,...,d„} is 
characterized by the following conditions: 


d}Ad;=0, iz j, d}Ad;#0, i=1,...,n. (3.10) 


The set is linearly independent and a basis of R”. An example for n = 2 is shown 
in Fig. 3.2. 

In the general case, the conjugate gradient method proceeds by successive 
approximations, generating a new search direction in every iteration. Given an 
approximation xx, the new search direction is d; = — g(x)" + k d«-ı, where 
do is arbitrary. A line search along direction dx gives the next approximation: 


Ag = arg, min(x, +Adx), Xk+1 = Xk + Ardk. (3.11) 


Different variants of the algorithm exist corresponding to different prescriptions for 
computing fg. Two of them are given here [6, p. 406-416]. 
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Fig. 3.2 Contour lines of the function F(x, y) = 2x? + 16y?, and descent with line search and 
conjugate gradients starting at the point xo = (3; 1). The minimum is reached in two steps 


Fletcher-Reeves algorithm 


go) ga) 
= , 3.12 
© Fee" = 
Polak-Ribiére algorithm 
u T 
by = ECP [g (Œ) — g(xi-1)] (3.13) 


goi gia) 


It is customary to set B; to zero if k is a multiple of n, in order to avoid accumulation 
of rounding errors. As such, convergence to the minimum in n steps is no longer 
guaranteed for non-quadratic F (x). 


3.1.3 Gradient-Free Methods 


A popular gradient-free method is the downhill-simplex or Nelder-Mead algo- 
rithm [7]. It can be applied to functions whose derivatives are unknown, do not 
exist everywhere, or are too costly or difficult to compute. In n dimensions, the 
method stores n+ 1 test points x1, ..., Xn+1 at every iteration, ordered by increasing 
function values, and the centroid xo of all points but the last one. The simplex 
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F(x,y) = (0.8 — x)? + 200(y — x”)? 
2 | j ZZZ ZI] 
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Fig. 3.3 Contour lines of the Rosenbrock function F(x, y) = (0.8 x)? + 200 (y x’)? and 
minimization with the downhill-simplex method starting at the point xo = (1.5; 1). With the 
tolerance 107? on the function value the minimum is reached after 90 steps 


generated by the test points is then modified according to the function values in 
the test points. The allowed modifications are: reflection, expansion, contraction 
and shrinking. The iteration is terminated when the function value of the best point 
does not change significantly anymore. The size of the initial simplex is important; 
choosing it too small can lead to a very localized search. On the other hand, it is 
possible to escape from a local minimum by restarting the search with a sufficiently 
large simplex. 

An example with the Rosenbrock function F(x, y) = (0.8—x)?+200 ( yaa is 
shown in Fig. 3.3. The function has a very shallow minimum at x = 0.64, y = 0.8. 
With the tolerance 107? on the function value the minimum is reached after 90 steps. 

Other gradient-free methods are simulated annealing, tabu search, particle swarm 
optimization, genetic algorithms, etc. 


3.2 Statistical Models and Estimation 


In the context of this book a statistical model is defined as a functional dependence 
of observed quantities (observations or measurements) on unknown quantities of 
interest (parameters or state vectors). The parameters cannot be observed directly, 
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and the observations are subject to stochastic uncertainties. The aim is to estimate 
the parameters from the observations according to some criterion of optimality. 


3.2.1 Linear Regression Models 


A linear regression model has the following general form [8]: 
m=Fp+c+e, E[e] 2-0, Varfe]=V=G", (3.14) 


where m is the (n x 1)-vector of observations, F is the known (n x m) model 
matrix with m < n and assumed to be of full rank, p is the (m x 1) vector of model 
parameters, c is a known constant offset, and e is the (n x 1) vector of observation 
errors with zero expectation and (n x n) covariance matrix V, assumed to be known. 

LS estimation of p requires the minimization of the following objective function: 


S(p) = (m — Fp— c)! G im — Fp—c). (3.15) 
The least-squares (LS) estimator p and its covariance matrix C are given by: 
p=(F'GF)!F'G(m-—c), C=(F'GF)"!. (3.16) 


The estimator p is unbiased and the estimator with the smallest covariance matrix 
among all estimators that are linear functions of the observations. If the distribution 
of e is a multivariate normal distribution, the estimator is efficient, i.e., has the 
smallest possible covariance matrix among all unbiased estimators. 

The residuals r of the regression are defined by: 


r=m-c-Fp, R=Var(r]=V—F(F'GF)'F'. (3.17) 


The standardized residuals s, also called the “pulls” in high-energy physics, are 
given by: 
ri 


J/Ri' 


If the model is correctly specified, the pulls have mean O and standard deviation 1. 
The chi-square statistic of the regression is defined as: 


D-—1..m. (3.18) 


Si = 


x? =r'Gr, with E[x?] —n-m. (3.19) 


If the observation errors are normally distributed, x? is x?-distributed with d = 
n — m degrees of freedom; its expectation is d and its variance is 2d. Its p-value p 
is defined by the following probability transform: 


40 3 Statistics and Numerical Methods 


pide J. sidr, (3.20) 
" 


where G;(x) is the cumulative distribution function of the X?-distribution with 
k degrees of freedom and g(x) is its probability density function (PDF). Large 
values of x? correspond to small p-values. If the model is correctly specified, 
p is uniformly distributed in the unit interval. A very small p-value indicates a 
misspecification of the model or of the covariance matrix V, or both. 


3.2.2 Nonlinear Regression Models 


The linear regression model in Eq. (3.14) can be generalized to a nonlinear model: 
m= f(p)--e, E[se] 0, Var[e] V = G^, (3.21) 


where f is a (n x 1)-vector of smooth functions of m variables. LS estimation of p 
requires the minimization of the following objective function: 


S(p) = [m — f ép] G [m — f (p»]. (3.22) 


The function S(p) can be minimized with any of the methods discussed in Sect. 3.1. 
The one used most frequently is probably the Gauss-Newton method, based on the 
first-order Taylor expansion of f and resulting in the following iteration: 


= z = 2 of 
Bua = Pr + (FLGF,) FIG [m - fp]. Fe= =|. - (3.23) 
OP | Px 
At each step, the covariance matrix Ck+1 of p, , is approximately given by: 
Cia = (FGF p~". (3.24) 


In general, the covariance matrix of the final estimate p can be approximated by the 
inverse of the Hessian of S(p) at p. The final chi-square statistic x? is given by: 


x! = [m - f(p)]' G [m - f]. (3.25) 


In the case of Gaussian observation errors, the chi-square statistic is approximately 
x? -distributed, and its p-value is approximately uniformly distributed. The iteration 
is stopped when the chi-square statistic does not change significantly any more. 
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3.2.3 State Space Models 


A dynamic or state space model describes the state of an object in space or time, 
such as a rocket or a charged particle [9]. The state usually changes continuously, 
but is assumed to be of interest only at discrete instances in the present context. 
These instances are labeled with indices from 0, the initial state, to n, the final state. 
The state at instance k is specified by the state vector q. The spatial or temporal 
evolution of the state is described by the system equation, which is Eq. (3.26) in the 
linear case and Eq. (3.41) in the general case. 


3.2.3.1 Linear State Space Models and the Kalman Filter 


In the simplest case, the state at instant k is an affine function of the state at instant 
k — 1 plus a random disturbance called the system or process noise, with known 
expectation and covariance matrix: 


qk = Far-ı-ı dk yj Ely] = ge. Var[yı] = Qr- (3.26) 


The process noise y, may affect only a subset of the state vector, in which case its 
covariance matrix Q% is not of full rank. 

In most cases, only some or even none of the components of the state vector 
can be observed directly. Instead, the observations are functions of the state plus an 
observation error. In the simplest case, this function is again affine: 


my = Hıq, tee +ex, E[ey] 2 0, Varlex] = Vy = Gr. (3.27) 


Process noise and observation errors are assumed to be independent. 

Given observations m, ..., m, , an initial state qo and an initial state covariance 
matrix Co, all states can be estimated recursively by the Kalman filter [10]. Assume 
there is an estimated state vector q,. 4 with its covariance matrix Cy. , at instant 
k — 1. The estimated state vector at instant k is obtained by a prediction step followed 
by an update step. After the last update step, the full information contained in all 
Observations can be propagated back to all previous states by the smoother. If both 
process noise and observation errors are normally distributed, the Kalman filter is 
the optimal filter; if not, it is still the best linear filter. 


Prediction step The state vector and its covariance matrix are propagated to the 


next instance using the system equation Eq. (3.26) and linear error propagation: 


diii = Fik-idy-i + dk gj. Cii = Fig-iCiA Fui + Qk. 
(3.28) 
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Update step The updated state vector is the weighted mean of the predicted state 
vector and the observation mg. There are two equivalent ways to compute the 
update. The first one uses the gain matrix Kx: 


dy = dy i Ke (me — ck — Hxdggy i), Cx =  — Kx Hi)Ciqka, 


(3.29) 

Kr = Cy A H} (Vk + Ae Cue Hy) |. (3.30) 
The second one is a multivariate weighted mean: 

di = Ck [ek + HIG, (mk = ex) | , (3.31) 

C, = (Cj, + HIGH". (3.32) 


The update step has an associated chi-square statistic X which can be computed 
from the predicted residual rgjk—1 or from the updated residual rg: 


rk|k-1 = my — ck — AG yp—1, Reir-ı = Var [rii] = Vie + AY Cy}, 


(3.33) 
ry = my — ck — Hyqq,. Ry = Var [rk] = Vy — HıCıH], (3.34) 
Xe = ripa Regi kk-i =r] Ry! ry. (3.35) 


If the model is correctly specified and both process noise and observation errors 
are normally distributed, x 5 is x?-distributed with a number of degrees of freedom 
equal to the dimension of mx. The total chi-square x2, of the filter is obtained by 
summing xÈ over all k. 


Smoothing The smoother propagates the full information contained in the last 
estimate q,, back to all previous states. There are again two equivalent formulations. 
The first one uses the gain matrix A, of the smoother: 


fkin = Ir + Aken — diii Ak = CEP pug Corp (3.36) 
Ckjn = Ck — Ag(Crpije — Cetin) AL. (3.37) 


This formulation is numerically unstable, as the difference of the two positive 
definite matrices in Eq. (3.37) can fail to be positive definite as well because of 
rounding errors. The second, numerically stable formulation realizes the smoother 
by running two filters, one forward and one backward, on the same sets of 
observations. The smoothed state is a weighted mean of the states of the two filters: 
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=j —1 
= = b ~b zi - b 
din = Cin E qy + (Chen) ah] ner (Chr) , 


(3.38) 


where Gi x41 ÍS the predicted state from the backward filter and C i k+1 its 
covariance matrix. Alternatively, the predicted state from the forward filter and the 
updated state from the backward filter can be combined. The smoother step has 
an associated chi-square statistic Xj n» Which can be computed from the smoothed 
residuals: 


Pkin = my — Ck — AgGyin, Rejn = Var[rxın] = V= AyCyin H], (3.39) 


2 T p-1 
Xk|n — rus Repl k\n- (3.40) 


If the model is correctly specified and both process noise and observation errors are 
normally distributed, Xin is x”-distributed with a number of degrees of freedom 
equal to the dimension of mg. As the smoothed state vectors are estimated from 
the same information, they are correlated. The prescription for computing their joint 
covariance matrix is given in [11]. 

The Kalman filter can also be implemented as an information filter and as a 
square-root filter [10]. 


3.2.3.2 Nonlinear State Space Models and the Extended Kalman Filter 


In the applications of the Kalman filter to track fitting, see Chap.6, the system 
equation is usually nonlinear; see Sect. 4.3 and Fig. 4.4. It has the following form: 


dk = fiia(dy D t yi. E[vi] = si. Var[yı] = Qr. (3.41) 


In the prediction step, the exact linear error propagation is replaced by an approxi- 
mate linearized error propagation; see also Sect. 4.4 and Fig. 4.5. 


OF ica 
ðar- [dii 
Ckik-1 = Fiji Ci Fu + Q,. (3.42) 


Irır-ı fidc + Be Fair-ı = 


More rarely, also the measurement equation can be nonlinear: 
m = hiq) + ex, Elex] — 9, Varlex] = Vk = Gx. (3.43) 
The first order Taylor expansion of hx at q k|k—1 gives: 


he (dy) © Hkqy + Cr, ex = Mk eye) — Hrgkır-ı- (3.44) 
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Using this Hx and c, in Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32) gives the 
update equations for the nonlinear measurement equation. If required, the update 
can be iterated by re-expanding hg at q, and recomputing Hg, ck and Kx. The 
smoother can be implemented via its gain matrix, Eqs. (3.36) and (3.37), or by the 
backward filter, Eq. (3.38). 


3.3 Clustering 


Clustering is the classification of objects into groups, such that similar objects 
end up in the same group. The similarity of objects can be summarized in a dis- 
tance/similarity matrix containing the pair-wise distances/similarities of the objects 
to be clustered. A cluster is then a group with small distances/large similarities 
inside the group and large distances/small similarities to objects outside the group. 
After clustering, the resulting cluster structure should be validated by measuring 
the internal consistency of each cluster. For further information and examples, the 
reader is directed to the literature [12-14]. 


3.3.1 Hierarchical Clustering 


Hierarchical clustering groups the objects with a sequence of nested partitions [12, 
Chapter 3]. If the sequence starts from single-object clusters and merges them to 
larger clusters, the clustering is called agglomerative; if the sequence starts from a 
single cluster containing all objects and splits it into successively smaller clusters, 
the clustering is called divisive. At any stage of the clustering, all clusters are 
pairwise disjoint. The number of clusters is not necessarily specified in advance, 
but can determined “on the fly”. If in divisive clustering a cluster is considered to 
be valid, further splitting is not required, but also not forbidden. If in agglomerative 
clustering the merging of two clusters results in an invalid cluster, the merger is 
undone. 


3.3.2  Partitional Clustering 


Partitional clustering directly divides the objects into a predefined number K of 
clusters, usually by optimizing an objective function that describes the global quality 
of the cluster structure [12, Chapter 4]. Partitional clustering can be repeated for 
several values of K in order to find the optimal cluster structure. In the fuzzy variant 
of partitional clustering, an object can belong to more than one cluster with a certain 
degree of membership [15]. 
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3.3.3 Model-Based Clustering 


In model-based clustering, the objects are assumed to be drawn from a mixture 
distribution with two or more components. Each component is described by a 
PDF and has an associated probability or “weight” in the mixture [16]. The 
parameters of the mixture are usually estimated by the Expectation-Maximization 
(EM) algorithm [17-19]; see Sect. 7.2.2. A by-product of the EM algorithm is the 
posterior probability ;, of object i belonging to cluster k, for all objects and all 
clusters. The result is again a fuzzy clustering. 
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Part II 
Track Reconstruction 


Chapter 4 A) 
Track Models Cheek for 


Abstract The chapter shows how the equations of motion for charged particles 
in a homogeneous or inhomogeneous magnetic field are solved. Various types 
of parametrizations are presented, and formulas for track propagation and error 
propagation are derived. As the effects of the detector material on the trajectory have 
to be taken into account, the statistical properties of multiple Coulomb scattering, 
energy loss by ionization, and energy loss by bremsstrahlung are discussed; then 
it is shown how the effects can be treated in the track reconstruction. As multiple 
scattering in thin layers and energy loss by bremsstrahlung have distinctive non- 
Gaussian features, an approximation by normal mixtures is presented. 


4.1 The Equations of Motion 


Consider a charged particle with mass m and charge Q = qe, where e is the 
elementary charge and q is an integer, usually g = +1. Its trajectory or position 
r(t) = (x(t), y(t), z6)! in a magnetic field B(r), as a function of time, is 
determined by the equations of motion given by the Lorentz force F « qv x B, 
where v = dr /dt is the velocity of the particle. In vacuum, Newton's second law 
reads [1]: 


dp 
7 kqv(t) x B(r(t)), (4.1) 


where p = ymv is the momentum vector of the particle, y = (1 — v?/c?)-1? 
is the Lorentz factor, and k is a unit-dependent proportionality factor. If the units 
are GeV/c for p, meter for r, and Tesla for B, then k = 0.29979 GeV/c T7! m-!. 
The trajectory is uniquely defined by the initial conditions, i.e., the six degrees 
of freedom specified for instance by the initial position and the initial momentum. 
If these are tied to a reference surface, five degrees of freedom are necessary and 
sufficient. Geometrical quantities other than position and velocity can also be used 
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to specify the initial conditions. The collection q = (q1, ...., gm) of these quantities 
is called the initial track parameter vector or the initial state vector. 

Equation (4.1) can be written in terms of the path length s(t) along the trajectory 
instead of r, giving [1]: 


d’r dr . 
qr ky - ds x B(r(s)) =T (s,r(s), r(s)) ; (4.2) 
where r(s) = dr/ds, v = q/p and p = |p|. In simple cases, this equation has 
closed-form solutions. In a homogeneous magnetic field, the solution is a helix; 
it reduces to a straight line in the limit of B = 0. In the general case of an 
inhomogeneous magnetic field, one has to resort to numerical methods such as 
Runge-Kutta integration of the equations of motion (Sect. 4.3.2.1), parametrization 
by polynomials or splines [1], or other approximations [2]; see Sect. 4.3.2.2. 

Equation (4.2) can be expressed in terms of other independent variables. For 
example, if the equations of motion are integrated in a cylindrical detector geometry, 
the radius R is a natural integration variable. In a planar detector geometry, the 
position coordinate z could be the variable of choice [2, 3]. 


4.2 Track Parametrization 


Different detector geometries often lead to different choices of the track parameters. 
However, the parametrization of the trajectory should comply with some basic 
requirements: the parameters should be continuous with respect to small changes 
of the trajectory; the choice of track parameters should facilitate the local expansion 
of the track model into a linear function; and the stochastic uncertainties of the 
estimated parameters should follow a Gaussian distribution as closely as possible. 
In order to fulfill, for instance, the continuity requirement, curvature should be 
used rather than radius of curvature, and inverse (transverse) momentum rather than 
(transverse) momentum. 

In a barrel-type detector system typical for the central part of collider experi- 
ments, a natural reference surface of the track parameters is a cylinder with radius 
R, centered around the global z-axis, which usually coincides with the beam line. 
The track parameters are, in this case, defined at the point of intersection P between 
the track and the reference cylinder. In such a system, one possible choice of track 
parametrization is the following: 


qı =4/PT, q2 — Q, q3 — tanA, qa = RO, q5 =z, (4.3) 


where pr = p cosà is the transverse momentum, & is the azimuth angle of the 
tangent of the track at P, X is the dip angle (complement of the polar angle) of 
the tangent at P, and RO and z are the cylindrical coordinates of P in the global 
coordinate system, see Fig. 4.1. 
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Fig. 4.1 Track > track 
parametrization according 

to Eq. (4.3). The parameter 
tan A is the slope of the 
tangent at the reference point 
with respect to the 


(x, y)-plane 


Fig. 4.2 Track 
parametrization according 

to Eq. (4.4). The parameters 
dv/du and dw/du are the 
direction tangents of the track 
at the reference point with 
respect to the u-axis 


In a detector system based on planar detector elements, the natural reference 
surface is a plane. Such a surface is uniquely determined by a normal vector of 
the plane and the position of a reference point inside the plane. A local coordinate 
system is defined such that the u-axis is parallel to the normal vector and the v- and 
w-axes are inside the plane. A natural choice of track parameters is now 


qı = V, q2 = dv/du, q3 = dw/du, q4 = v, qs = w, (4.4) 


where Y = q/p, dv/du is the tangent of the angle between the projection of the 
track tangent into the (u, v)-plane and the u-axis, dw/du is the tangent of the angle 
between the projection of the track tangent into the (u, w)-plane and the u-axis, and 
v and w are the local coordinates of the intersection point of the track with the plane; 
see Fig. 4.2. The quantities dv/du and dw/du are also called direction tangents, as 
the tangent vector to the track is proportional to the vector (dv/du, dw/du, 1)". 
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Fig. 4.3 Track 
parametrization according 

to Eq. (4.5). The parameters 
x, and y, are the distances 
from the reference point in 
the plane perpendicular to the 
track. The direction of the 
tangent is measured in global 
polar coordinates 


tangent 


reference point 4 


Another planar reference system is the curvilinear frame, which is useful 
for transporting uncertainties of track parameters; see Sect.4.4. It is a hybrid 
local/global reference frame. The curvilinear plane is always orthogonal to the 
direction of the track with the parametrization: 


qı = V. q2— 6. d3 =À, qa — X1. 45 = Y1, (4.5) 


where x, and y, are orthogonal position coordinates inside the plane, see Fig. 4.3. 
The x , -axis is parallel to the global (x, y)-plane. The azimuth and dip angles & and 
à are defined at the point of intersection of the track with the curvilinear plane, but 
their values are measured in the global Cartesian coordinate system. The tangent 
vector is thus proportional to (cos A cos $, cos A sind, sin A. 

In addition to the above-mentioned, surface-based frames, a global, Cartesian 
coordinate frame is also frequently used. In this frame, the track parameters are the 
position vector r and the momentum vector p at that point. Since these are not tied 
to a surface, six parameters are needed in order to uniquely specify the state of the 
track. The rank of their covariance matrix is at most five. 


4.3 Track Propagation 


The track model, given by the solution of the equations of motion, describes the 
functional dependence of the state vector q ; at a surface j on the state vector q; at 
a different surface i: 


qj — "m (a;). (4.6) 
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surface i surface j 


Fig. 4.4 Track propagator from surface i to surface j 


The function fj; is called the track propagator from surface i to surface j, 
see Fig. 4.4. When closed-form solutions of the equations of motion exist, e.g., in 
the two situations of vanishing magnetic field and homogeneous magnetic field, 
the track propagator can be written as an explicit function of the path length. In 
the straight-line solution of the equations of motion in a vanishing magnetic field, 
it is easy to also derive an analytical formula for the path length between two 
surfaces. For the helical solution in a homogeneous magnetic field, however, such 
an analytical formula exists only for propagation to cylinders with symmetry axis 
parallel to the field direction or to planes orthogonal to the field direction. Otherwise, 
a Newton iteration or a parabolic approximation has to be used to find the path 
length. 


4.3.1 Homogeneous Magnetic Fields 


The helical track propagator takes the solution to Eq. (4.2) as a starting point. The 
solution [4] can be written as: 


sin 


r(s) =r0+ 2 @-sinß)-h+ -to + = (1 — cos 8) no, (4.7) 
where r (s) is the position vector of the point on the helix at path length s from 
the reference point ro (ats = 0), h = B/|B| is the normalized magnetic field 
vector, f = p/p is the unit tangent vector to the track, n = (h x t) /a witha = 
|h x t|, ó— h-t, K= —vy|B|, and 0 = Ks. In the following, the subscript “0” 
indicates quantities defined at the initial point s = 0. Any point along the trajectory 
can be specified by a corresponding value of s. The equation of the unit tangent 
vector ¢ is found by differentiating Eq. (4.7) with respect to s, 
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dr(s) . 
t(s) = —; 7 oC — cos8) - h + cos0 - to + æ sin? «np. (4.8) 
5 
For a given value of s, any desired set of track parameters can be calculated 
from Eq. (4.7) (positions) and Eq. (4.8) (directions). In the helical track model, the 
momentum p is constant and therefore has the same value for all s. 


4.3.2 Inhomogeneous Magnetic Fields 


In an inhomogeneous magnetic field, the equations of motion have no exact closed- 
form solutions, and one has to resort to numerical, approximate solutions. 


4.3.2.1 Runge-Kutta Methods 


Runge-Kutta methods are iterative algorithms for the approximate numerical 
solutions of ordinary differential equations, given initial values. Among Runge- 
Kutta methods, the Runge-Kutta-Nyström algorithm is specifically designed for 
second-order equations such as Eq. (4.2). In the fourth-order version a step of length 
h, starting at s = s„, is computed by [1]: 


Pat = Fn thin +h? (ky + k2 + k3)/6, 


, : (4.9) 
Pott = Fn + h(ky +2k2 + 2k3 + k4)/6, 
with the intermediate stages k defined by 
kı = T (sn, fn, Fn), 
k2 = F (sn +h/2, rn + hin/2 + h7ky/8, Pn + hk} /2), 
(4.10) 


k3 = I (sn + h/2, rg + hřtn/2 + h?kı/8, ry + hk2/2), 
ka = I (Sn +h, rn + hin + h?k3/2, Fn + hks), 


where rn is the position of the particle at s = s,, 7; is the unit tangent vector, and T 
is defined in Eq. (4.2). The magnetic field needs to be looked up three times per step, 
at the positions Fn, Fn +htn/2+h?k1/8, and rn + hin +h?k3/2. If the field at the 
final position r„+1, which is the starting position of the next step, is approximated 
by the field used for k4, only two lookups are required per step. 

If integration variables other than the path length s are used, for instance the 
radius R or the position coordinate z, the integration equations Eq. (4.9) are very 
similar. The difference is that the step length /; must be expressed in terms of R or z, 
and that the function I has a different form when expressed in other variables [3]. 
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If the field is (almost) homogeneous, as for example in a solenoid, the step size 
h can be chosen to be constant; otherwise a variable step size, taking into account 
the local inhomogeneity of the field, is more efficient. Determining the variable step 
size along the propagation is done by a step-size selection algorithm. The essence of 
such an algorithm is to assess the local error € of the Runge-Kutta step and compare 
it to a user-defined error tolerance t. The so-called embedded Runge-Kutta pairs, 
originally invented by Fehlberg [5], provide a measure of the local error in an elegant 
way by producing solutions of different orders during the same step with little extra 
computational cost. The higher-order solution is denoted r„+1, whereas the lower- 
order solution is denoted r„+1. The difference 


€ = rai — Patil (4.11) 


between these solutions constitutes a measure of the error of the step. A popular 
algorithm for the step size h,+1 of step n + 1 is [6] 


rA l/G-D 
) ; (4.12) 


An+ı =hy e 
€ 


where hn is the size of step n and q is the order of the lower-order solution. This 
algorithm will effectively shorten the step size if the local error is larger than the 
tolerance and lengthen it if the local error is smaller than the tolerance, forcing the 
local error to oscillate around the tolerance. 

The original version of the Runge-Kutta-Nystróm algorithm is not of the 
embedded type and therefore contains no direct recipe for estimating the local error. 
It was realized by the authors of [7], however, that a fourth-order derivative of r 
can be formed by a combination of the various stages calculated along the step. 
This derivative is implicitly the difference between a fourth-order solution and a 
third-order solution and can therefore be used as a measure of the local error e: 


€ = h? |k, — kz — ka + kal. (4.13) 


This error measure can be used for step-size selection according to Eq. (4.12) and 
constitutes an adaptive version of the Runge-Kutta-Nystróm algorithm when used 
alongside the integration steps in Eq. (4.9). 


4.3.2.2 Approximate Analytical Formula 


An approximate analytical formula for track extrapolation in an inhomogeneous 
field is described in [2]. The magnetic field B(z) is assumed to depend only 
on the z-coordinate. The particle is assumed to move along the z-axis, and the 
track parameters are x, y, tx, ty, v, where tx, ty are the direction tangents. In this 
parametrization, the equations of motion read: 
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X = ty, 
y = ty, 
t. = h- [tytyBy — (1 + 2)By + ty B3] = az) - BC), (4.14) 


t =h-[(l Ttf 2)B = tyty B2 — ty B3] = b(z)- B(z), 
y’ =0, 


1/2 
where h = ky (1 + oa + i and the prime denotes differentiation with respect 


to z. The aim is to find formulas for the extrapolation of (tx, ty) from zo to Ze; 
the extrapolation of x and y can then be performed by integration of the direction 
tangents. 

Let T(z) = T(t, (z), t,(z)) be a function of t, and tz. Then 


aT aT aT 
Pe ey pu ge eee -y T;, Bj. (4.15) 
Oty T ct 


E re x 
Tj = ot 5 + Ot fy = ar b -y Tii Bi,. (4.16) 
s x in=1 


When this process is continued, the following functions can be defined recursively: 


3 


Th ig =D, Th Bas (4.17) 
i=l 
OT il ƏT; i 
I = gp tgp Pe (4.18) 


The exact relation 


Ze 


Paate / T'(Odz (4.19) 


20 
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can then be expanded into: 


3 Ze 
T (ze) = T (zo) + >| Ty (21) Bi dai = 


i=l 20 


3 Ze zl 
= T(zo) + >| (n.o «f Ti, (da) Bi (z1)dz1 = 


ip=1* 60 0 


3 Ze (4.20) 
= Teo) + Tu | By G)dzi+ 
zo 


ij-l 


3 Ze a 3 
ES >} B, f >; Tii, (Z2)dz2dz1 
20 


a <0 in=1 


If the expansion is terminated after n steps, the compact form of Eq. (4.20) reads: 


n Ze Zk—1 
T (Ze) & T(zo) +9 Y; Taio) X (ie B, Oda... dai) E 
zo Zi 


k=1 i1...ik “0 
kB (Ze — zo)! 
+0 ( n ~ ) (4.21) 


The integral over the field components is taken along an approximate trajectory. 
Setting T = f, and T = ty gives the desired extrapolation formulas. For a 
comparison with a Runge-Kutta solver, see [2]. 


4.4 Error Propagation 


The task of transporting the covariance matrix of the track parameters is essential 
for any track reconstruction algorithm, either from one set of track parameters to 
another in the same reference surface, or during track propagation from one surface 
to another. 

This so-called error propagation is straightforward when the transformed track 
parameter vector is a strictly linear function of the initial track parameter vector. 
The track propagator in Eq. (4.6), however, is in general a non-linear function, 
and exact error propagation is not feasible. The common solution is approximate 
linearized error propagation; see Sect. 3.2.3.2 and Fig. 4.5. The derivatives of the 
track propagator f jį; are collected in the Jacobian matrix Fj ;: 
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Var [qi] = Ci — Var [a;] RI FuC;F\, 


| ai = faq) 


surface i surface j 


Fig. 4.5 Error propagation from surface i to surface j. The error bars around the track positions 
and the cones around the direction vectors symbolize the growing uncertainty of the track 
parameters 


0q; 
Fjj— +. 


4.22 
óq; (4.22) 


The covariance matrix C; of the track parameters at surface i is transported to 
surface j according to: 


Cj x Fjj; C; F] 


j" (4.23) 


A general method for computing the Jacobians is numerical differentiation [8, 
Section 5.7], by propagating a reference track and five other, nearby tracks from 
surface i to surface j. Consider a reference track with parameter vector q; at surface 
i. A small variation of component / (| = 1,...,5) ing; is introduced by adding to 
q; the vector A; = (öırhı,..., ösıhr)". The corresponding change in parameter k 
(k= 1,...,5)at surface j is 


An = fk(d; + AD — fk (Qi), (4.24) 


where f; is component k of the track propagator f ;;;. The elements (F;j;)y of the 
numerical Jacobian matrix F;|; are then obtained by evaluating 


Aki 
(Fjli)k = g (4.25) 
l 


This procedure works for all track propagators, irrespective of whether or not they 
are in closed form. 
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4.4.1 Homogeneous Magnetic Fields 


The exposition in this section closely follows the treatment in [4]. In a homogeneous 
magnetic field, it is possible to obtain analytical formulas for the Jacobians defined 
in Eq. (4.22). The problem of calculating transport Jacobians from one plane of 
arbitrary orientation to another is naturally decomposed into three separate parts 
because the error propagation from one spatial location to another is performed most 
easily in a coordinate frame which moves along the track, i.e., the curvilinear frame 
introduced in Sect. 4.2. Therefore, the natural decomposition is first a transformation 
from a local coordinate system to the curvilinear frame at the initial surface, then 
a transport within the curvilinear frame to the destination surface, and finally a 
transformation from the curvilinear frame to a local frame at the destination surface. 
The total Jacobian is the matrix product of the three intermediate Jacobians. 

The starting point of calculating the transport Jacobians are differentials relating 
variations of position, direction, and momentum at the initial point (s = 0) 
to variations of the same quantities at any other point along the helix. These 
differentials are given by 


a aig oire ae (4.26) 
dro dto dpo ds 

Bi MEL a OE ole, (4.27) 
Oto Qo Os 


where dwo is the variation of the signed inverse momentum at the initial point and 
ds is the change in path length of the helix due to the variations at the initial point. 
An illustration of this effect is shown in Fig. 4.6. 

The partial derivatives are obtained by direct differentiation of Eqs. (4.7) and 
(4.8). The results are: 


or 
—— : dro = dro, (4.28) 
oro 
or 0 — sin sind 1 —cosd 
— -dtg = ————— - (h: dto) - h + —— -dt — — —— . (h x dto), 
ar 0 x ( DES r o+ K (h x dto) 

(4.29) 

ua ER ee J- dy (4.30) 
—. = — Ís- ro—r]- ; ; 
Wo 0 V 0 0 
or 
— -ds = t - ds, (4.31) 
os 

ot : 
ga, MU e cosb- dio + AIR): (4.32) 

0 

ot dy; aks di (4.33) 
rC— 0 -— .N- 0; i 
0 Vo 
ot 


—-ds=akK-n-ds. (4.34) 
as 
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dr 


original track : 
„ds 


displaced track 


dro 


Fig. 4.6 A track and the displaced track due to a variation dro are shown. In the error propagation, 
the change ds of the path length has to be taken into account. In this specific case, dr is understood 
to be perpendicular to the track (see also Eq. (4.41)) 


4.4.1.1 Transformation from One Curvilinear Frame to Another 


The curvilinear frame is uniquely defined at each point along the track by three 
orthogonal unit vectors u, v and t, defining a coordinate system (x1, yj, z4). The 
vector £ has been defined above as the unit vector parallel to the track, and pointing 
in the particle direction. The two vectors u and v are defined by 


zxt 
u= ,v-—txu, (4.35) 
|z x t| 


where z is the unit vector pointing in the direction of the global z-axis. This means 
that the z |, -axis is pointing along the particle direction, the x | -axis is parallel to the 
global (x, y)-plane, while the y ; -axis is given by the requirement that the three axes 
should form a Cartesian, right-handed coordinate system. The relations between the 
momentum components ( Px» Py» Pz) in the global Cartesian frame and the angles 
are: 


Px = pcosAcos $, 
Py = pcosAsino, (4.36) 


Pz = psinà. 
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The Jacobian of the transformation from a curvilinear frame (y, $, A, x1, y1) at 
so = 0 to the same set of parameters at path length s is then derived by forming the 
differentials dr and dt, introducing the specific constraints given by the curvilinear 
frames, 


dro = uo: dx Lo + vo: dyio, (4.37) 
Oto Oto 

dto = —— -ddo + — - dào = cos Ao - uo - doo + vo - dào, (4.38) 
ado dA 

dr =u-dx, +v- dy, (4.39) 

dt = cos à - u - dọ + v - dA. (4.40) 


Moreover, since dr is now defined to be a variation in a plane perpendicular to the 
track, the functional dependence of ds on the variations of position, direction, and 
momentum at the initial point can be evaluated by multiplying Eq. (4.26) with t and 
using the constraint dr - t = 0. One obtains 


ds = —t - dr (Z an) G T ) dy (4.41) 
s= 0 au to In 0- ; 


Inserting Eq. (4.41) and Eqs. (4.28) — (4.34) into Eqs. (4.26) and (4.27), making 
use of Eqs. (4.37) and (4.40), yields a set of equations relating variations of the 
parameters at the initial surface to the variations at the destination surface. These 
can then be manipulated in a straightforward manner to yield the differentials of 
the parameters at the destination surface. Since, for instance, the differential dx | is 
defined as 


Ox, Ox, Ox, Ox, Ox, 
dij = LE. dyo + 2," dio. + diy deg — ds, 
0 Vr apo do 0x 10 9y10 
(4.42) 


the different terms in the desired Jacobian can be identified as the multiplying factors 
of the variations at the initial surface. A complete list of these can be found in 
Appendix A. Since a perfectly helical track model is assumed, the momentum at the 
destination surface is equal to the momentum at the initial surface. 


4.4.1.2. Transformations Between Curvilinear and Local Frames at a 
Fixed Point on the Particle Trajectory 


Consider the transformation from a local plane— defined by a unit vector i normal to 
the plane and two, orthogonal unit vectors j and k inside the plane—to the curvilin- 
ear frame. These three unit vectors define the coordinate system (u, v, w) introduced 
in Sect. 4.2. The aim is to derive the Jacobian of the transformation between 
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(V, v', w', v, w), where v' = dv/du and w = dw/du, and (Y, $, A, x 1, y1) at 
a given point s on the particle trajectory. The relevant differentials are now 


dr=u-dx,+u-dyy = j -dv +k -dw +t - ds, (4.43) 
ot j ot , 0t 

dt = cos À - u - dọ + v - dà = —-dv + -dw + — - ds. (4.44) 
av’ Ow’ Os 


Again dr is orthogonal to f, therefore ds can be calculated from Eq. (4.43), in a 
similar manner as before. The result is 


ds = — (t- j) -dv — (t- k) dw. (4.45) 


By inserting Eq.(4.45) into Eqs. (4.43) and (4.44) and following the same 
procedure as in Sect. 4.4.1.1, explicit expressions of the differentials can again be 
constructed. For this also £ and its derivatives, expressed in the local parameters, are 
needed. The formulas are: 


joe pog. (4.46) 


V 14 v? + w? 


ot 1 ; v' 
-= 5 5 J 5 st ; (4.47) 
Qv — 1p v? 40! V 12- v2 +w 


ot 1 w' 
= k t|. (4.48) 
ðw — 1p v? +w? V1 v? +w? 


The Jacobian of the transformation from the curvilinear to the local frame can 
be derived by inverting the Jacobian of the transformation from the local to the 
curvilinear frame. The expressions of both Jacobians can be found in Appendix A. 


4.4.1.3 Transformations Between Global Cartesian and Local Frames 


The global Cartesian frame (px, Py, Pz, x, y, Z)is useful for vertex reconstruction 
purposes as well as for track reconstruction in zero and inhomogeneous magnetic 
fields. Zero magnetic field is typical in, for instance, test-beam applications. 

First the transformations between a local frame and the global Cartesian frame is 
considered. The basic strategy is to go via an intermediate Cartesian frame aligned 
with the local frame under consideration. This Cartesian frame is related to the 
global Cartesian frame via a pure rotation. The intermediate Cartesian frame C’ will 
be denoted by primed quantities (pz, pP, pz, x’, y', z^), where the x’- and y'-axes 
are parallel to the v- and w-axes of the local frame, respectively. The Jacobian R 
of the transformation from the global, Cartesian frame to the intermediate frame 
C’ consists of two similar, three-by-three blocks, each containing the relevant 
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rotation matrix. The other entries of this matrix are zero. The Jacobian Jc’=ı 
of the transformation from the primed Cartesian frame C’ to the local frame L is 
constructed from the following formulas: 


E 4 (4.49) 
PoP py? + 
$ 
v = Ps, (4.50) 
p. 
p! 
w = —. (4.51) 
Pz 


The total Jacobian is given by the product Jc.,- R. 
For the inverse transformation, the Jacobian J;_.c’ can be derived from the 
following relations: 


f 


en a Sz V 

Py rn M 

,_ 4 . Sz: w 

P7. ieee ue = 
1 : (4.54) 


ZEN z 
PEU. Vip 


where s; = sign(p;) is the sign of the z-component of the momentum vector in the 
local frame. It is needed in order to uniquely specify the state of the track in the 
local frame. The total Jacobian is in this case given by the product RT - J; c, 
using the same matrix R as above. The Jacobians J c, and J zc are shown in 
Appendix A. 


4.4.2 Inhomogeneous Magnetic Fields 


As mentioned above, the common way of obtaining the Jacobian matrices needed 
for the linearized error propagation is to expand the track propagator functions 
to first order in a Taylor series. As there are no analytical expressions for the 
track propagator in the case of inhomogeneous magnetic fields, this approach is 
unfortunately impossible. Another common technique is the error propagation by 
numerical derivatives described earlier. This method is slow but robust and accurate. 
A third way of obtaining the Jacobian matrices is to differentiate the recursion 
formulae of the numerical integration method directly. This is the essence of the 
so-called Bugge-Myrheim method [3]. 
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In the parameter propagation, the propagated global track parameters: 
x 
r=|y|, r=t= [hn], (4.55) 
z 


are obtained by integrating the equations of motion, using some recursion formulas. 
With the Runge-Kutta-Nyström method, one step (numbered by n) becomes: 


R h? ; 
Pati =n hfn + -Cki + ka + ks) = Fan. Fn), (4.56) 


Patt =tnt+ z +2k2 + 2k3 + k4) = Gn (fn, r4). (4.57) 


To obtain the Jacobian matrix of the propagated global track parameters with respect 
to the initial track parameters q;, the recursion formulae (Eq. (4.57)) have to be 
differentiated with respect to q;, giving: 


OFn+1 OF, OF, OF yn Orn 
04; 0qj Orn Of, qj 

= " = = * H = D á , 

cid EE 9G, 9G, 8G, din dis 
0q; 0q; Or, Orn 0q; 


(4.58) 


where the derivatives 0r, /0q; and 0r, /0q; of the 6 x j Jacobian J are given by 
the following 3 x j matrices: 


O Xp OXn Oty n Oty, n 

0qi 0qi, j . dqi,ı 9qi, j 
edi m s p (4.59) 
94i Izn üz, Mi lon, ten 

0qi4 0qi, j ðqi,ı ðqi, j 


D, is a 6 x 6 matrix containing the recursion formulae F,, and G, differentiated 
with respect to the global track parameters: 


OF, OF, 

9(F5, Gn) Orn An 
=ne = 4.60 
O(n. Tn) 9G, 9G, ( ) 
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giving the 3 x 3 matrices 


ə F,1 OF, 9 Fs 1 e OF yn 
Ot ot 
gu, qme e] aur |, Tee (4.61) 
Orn Y ^ Of. 7 : l 
OF n3 ER OF n,3 Ə Fn,3 d ð Fs 
OXn OZn Otx,n Ofz n 
and 
0Gg,1 - 9G, 9G, n 9G,1 
Ox az Oty, Ot;. 
0Gn | o 077 8G, | 7.0 07 (4.62) 
fs jus. ns On IGn3 6 
OXn Oz, Ofy n Otz n 


By writing the recursion formulae of the derivatives as a product of D; and Jn 
(Eq. (4.58)), the recursion formulae F; and G, can be differentiated with respect to 
the global track parameters r; and r, instead of the initial track parameters q;. This 
greatly simplifies the differentiation, giving: 


OF h? (dk, Oko ðk 
n E 1+4 ( 1 2 ER J 
Orn 6 NOr, Or, Orn 
oF h? (dk, Oko Ok 
r r r r 
n n n n (4.63) 
aG h (dk ak Ok, ðk 
no ( u ig :. 
Orn 6 \Orn orn orn Orn 
h (ak k k k 
alee SML p a AE 
Orn 6 \ Orn Orn Or, Or, 


In order to calculate these derivatives explicitly, the individual stages of the Runge- 
Kutta-Nystróm method have to be differentiated with respect to the global track 
parameters: 


ðk c 9h 


Aj ==, ı= i 
Orn Orn 


(4.64) 


where / denotes the individual stages, and k; is given by the equations of motion of 
the global track parameters in Eq. (4.2): 
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d2x " 

Fe, = é (f B; — t By), 

Cy 4.65 
de> = é (t Bx — t Bz), =) 
d*z 

= = EG By 5B), 


where £ = kw. Writing the 3 x 3 matrices A; and C; in a general form yields: 


and 


with 


0x" Ox" 
Ot, IG 0 EB, —B, 
> ot [2[|-6B 0 EB, (4.66) 
Oz" az” E By —é By 0 
Ot, Ot; 


(t Bz;x — kBy;x) (Bzy — £By;y) (Bz; — kBy;z) 
zi (£ By. E t Bz.x) (t£ By; y = t Bz. y) (t£ By... = t Bz.z) E 
(& By;x m ty By; x) (& By, m ty Bx;y) (t, By.- Ez ty Byz) 


(4.67) 


0 B 
Busy = A 


, u,v € {x, y, Z}. (4.68) 


With the help of these matrices, the elements of the matrix D, in Eq. (4.60) are 
computed. D, is then multiplied by J„ to produce the transported Jacobian Jn+1; 
see Eq. (4.58). This procedure is repeated for every recursion step, transforming J 
along the way. If, for instance, a planar detector element is reached at the end of 
the propagation steps and a measurement is to be included in a track reconstruction 
algorithm, the Jacobian can be transported from the global parameters to a local set 
of parameters as described in Sect. 4.4.1.3. 

When applied to real problems, the field gradients in C are usually quite costly 
to calculate. The calculations can be significantly sped up by setting elements of C 
with negligible influence on the Jacobians to zero. The sensitivity of the Jacobians 
to the field gradients can be checked by a simulation study. 
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4.5 Material Effects 

4.5.1 Multiple Scattering 

4.5.1.1 The Distribution of the Scattering Angle 

Elastic Coulomb scattering of particles heavier than electrons, including the muon, 


is dominated by the atomic nucleus. The PDF of the single scattering angle 0 has 
the following form [9]: 


k0 
PU SUED if0 <b, 
f@Q)=4 +a) (4.69) 
0 otherwise, 
with the normalization constant 
k=2a? (1 + a? /b*) (4.70) 


The PDF in Eq. (4.69) is derived from the Rutherford scattering formula by setting 
sind 7 0 and introducing a minimal and a maximal scattering angle, thereby 
avoiding the singularity at 0 = 0. The minimal and maximal angles a, b depend 
on the nuclear charge Z and the atomic mass number A of the nucleus as well as on 
the momentum p of the scattered particle [9]: 


2.66.1079. ZU/3 0.14 
a= B 


p EL 


(4.71) 


Here and in the following, p is assumed to be given in units of GeV/c. The ratio 


204 M? 
p=bja= | m (4.72) 


depends only on the nuclear charge Z, with the exception of very heavy nuclei, 
where the approximation A ~% 2Z breaks down. The size of p is typically of the 
order 10^. The normalization constant k can therefore be approximated by k ~ 2a?. 
It follows that the expectations of 0 and 0? are approximately given by: 


4.18 - 1076 . 21/3 2.84- 10-11. 72/3. 1n(159 Z7 1⁄3 
E[o] 2 —— — — —, E[9°] = z ar ) 
p p 


(4.73) 


In track reconstruction, the scattering angle 0 is of little use, as it is more 
convenient to work with two projected scattering angles instead, for instance, 
0, = 0 cos $ and 6, = 0 sind, if the particle runs parallel to the z-axis. Under 
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the assumption that the azimuthal angle $ is independent of 0 and uniform in the 
interval [0, 27], their joint PDF is given by [10]: 


1 a? 
"(E MEC CE E da monu (4.74) 
0, otherwise. 


The support of the joint PDF is a circle around the origin with radius b. The 
projected angles 6, and 6, are uncorrelated but not independent. The marginal PDF 
of the projected angle in the interval [—b, b] has the following form: 


a /p — 82 arccos (Voy + a? [vat x 8!) 
05) = P , 
ipe (a? + b?) (02 + a?) (02 + a?)3/2 


(4.75) 


with 6, either 6, or 0y. The marginal PDF has zero mean, a single mode at 0p = 0 
with f(0) ~ 1/(2a), and the following variance: 


1.42 - 107!! . Z? . In(159 Z713) 
— pi A 


var [05] = E[0 21/2 (4.76) 


The range of the distribution is very large. For silicon (Z = 14), it is about +2500 
standard deviations. 

If the projected scattering angle is still small after N scattering processes, it 
is approximately equal to the sum of the individual projected scattering angles. 
Its distribution can be obtained either by the convolution of N single scattering 
distributions [10] or by the Molière theory [11-13]. Figure 4.7 shows the two PDFs 
for a muon with p = 1 GeV/c and four silicon scatterers. Their thickness d is given 
in fractions of a radiation length [14]. The corresponding numbers of scattering 
processes in the four scatterers have been chosen as N = 210 213 216 and 219, 
respectively, spanning the range from about 0.2% to about 90% of a radiation 
length. There is excellent agreement for all scatterers but the thickest one; in this 
case, the tails are more persistent in the Moliere PDF than in the PDF obtained by 
convolution. The discrepancy is at angles larger than the maximum scattering angle 
b, which is equal to about 46 mrad in silicon for p = 1 GeV/c. 

For the purpose of track reconstruction, the “exact” distribution of the projected 
scattering angle has to be approximated by a single Gaussian or a mixture of 
Gaussians; see Sect. 6.2.3. The most commonly used approximation by a single 
Gaussian is the Highland formula, proposed in [15] and modified in [16]. The most 
up-to-date version can be found in [14], which gives the standard deviation of the 
projected scattering angle as: 
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Fig. 4.7 Probability density functions of the projected multiple scattering angle for silicon targets 
obtained by convolution (solid lines) and frequency distributions obtained by simulation from the 
Moliére densities (dots). The vertical dashed lines in figure (d) show the upper limit b ~ 46 mrad 


in silicon 


v var[65] = —— — E xl [1 +0.0381n ea 3ll (4.77) 


where p, p and z are the momentum, velocity, and charge number of the incident 
particle, and d/ Xo is the thickness of the scatterer in units of the radiation length. 
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If the scatterer consists of a composite material with k components, its radiation 
length Xo is given by: 


k 
—— 2 i (4.78) 


where X; is the radiation length of component i in g/cm? and f; is the mass fraction 
of component i [17]. In the following, it is more convenient to express d and Xo in 
centimeters: 


Xo _ g/cm? Xo 


cm oe g/cm?' 


(4.79) 


where o is the density of the material in units of g/cm?. 

In a thin scatterer, a Gaussian distribution is but a poor approximation of the 
actual distribution of the multiple scattering angle, because of the latter's large 
range. A better, though still far from perfect, approximation can be obtained using a 
normal mixture with two components, one modeling the “core”, the other modeling 
the "tails" [10, 18]. The standardized two-component mixture PDF of 0, with 
variance equal to 1 has the following form: 


f Op) = 1 — 5) p (0p; 0,02) + £ - 9 (05; 0, 02), (4.80) 


where o? < o2, e < 1/2 and o2 = [1 — (1 — £) o2]/e. 
The core variance o? is parametrized in terms of the reduced thickness dj) = 


d/(ß?Xo), where Xo is the radiation length of the scatterer: 
a? = 8471-1071 + 3.347 - 107° - Indy — 1.843 - 107° - (Ind)? (4.81) 


The tail weight € is parametrized in terms of a modified reduced thickness dj = 
Z^Pd/(B. Xo): 


_ Jf 4841-10? + 6.348 - 1073 - Ind; + 6.096 - 1074 - (Indj)’, if Indy «0.5, 
~ | =1.908 - 10°? + 1.106 - 107! - In dë — 5.729 - 107° - (ndj), if Indj=0.5, 
(4.82) 


Finally, the standardized PDF in Eq. (4.80) is scaled with the total standard deviation 
of the projected scattering angle without the logarithmic correction; see Eq. (4.77). 
This ensures that the variance of the scattering angle is strictly additive if the 
scatterer is divided into thinner slices. For a comparison of the mixture model with 
simulations by GEANT4 [19], see [18]. 
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4.5.1.2 Multiple Scattering in Track Propagation 


If a charged particle is propagated through material, the covariance matrix of 
the track parameters is augmented by the additional uncertainty on direction 
and possibly position caused by multiple scattering. The algorithmic treatment is 
different for “thin” and “thick” scatterers. By definition, in a thin scatterer the 
offset—the change in position of the passing particle—is negligible in relation to the 
spatial resolution of the surrounding detectors, so that only the direction is affected. 
For instance, in a silicon sensor with a typical thickness of 0.3mm the standard 
deviation of the offset is less than 0.2 um for momenta above 1 GeV; see Eq. (4.107) 
below. As the spatial resolution is in the order of 10 um, the offset can be safely 
neglected, and the sensor is considered as a thin scatterer. 

If, on the other hand, the scatterer is a 5cm thick iron absorber in a muon 
spectrometer, the standard deviation of the offset is about 0.66 mm for a muon with 
1 GeV, and the offset can be as large as 2mm. This is no longer negligible in a 
muon spectrometer equipped with drift chambers having a typical spatial resolution 
of 150 um. The absorber should therefore be treated as a thick scatterer, at least for 
low-momentum muons. 


Thin Scatterers 

Consider a thin scatterer with nominal thickness d, radiation length Xo, and unit 
normal vector n at the point where the track crosses the scatterer. The track is 
specified by a vector q of track parameters and the associated covariance matrix 
C. In a thin scatterer, only the sub-matrix of C corresponding to the track direction 
is augmented. The details depend on the choice of the track parametrization; 
see Sect. 4.2. 


A. The track is parametrized as in Eq. (4.3): 


q 
qı = 2 q2 = $, q3 — tanA, q4 = RẸ, qs =z. (4.83) 


1. Compute the unit direction vector a and the momentum p of the track: 
a = (cos $ cos à, sin $ cos À, sina)! , D=1/(qilcosa). (4.84) 
2. Compute the effective thickness r in units of Xo: 


d|a | 
t = ——— —, 4.85 
[a - n| Xo ( ) 


3. Compute the variance of the projected multiple scattering angle, assuming 
P = 1 and q = 1 if unknown: 
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131097 


var [85] = 5 (1 + 0.038 Int)? . 
p 


4a. If sin = 0 compute the covariance matrix D of (a2, a3)!: 
D = J - C (2:3,2:3)- Jl, with 


zus 0 ) 


0 cos?X 


Augment D by the contribution of multiple scattering [1]: 


2 


m 
D' = D + var [0p] - lar dats \ 
-azaz 1-az 


Modify C: 
C (2323) = J7! - D'- (J^), with 


jie 1/cos @ cos A 0 
u 0 1/cos?AJ ` 


4b. If sin ó zz 0 compute the covariance matrix D of (a1, a3)!: 
D=J-C (23,23): J", with 


y=(— sin$cosA — cosó sin À cos? 
u 0 cos?A f 


Augment D by the contribution of multiple scattering [1]: 


"m x 
D' = D + var [65] - I AS ja 
—aja3 l— aj 


Modify C : 


C (2:3,2:3) = J7} - D' (J^), with 


pig i = oe 


0 1/cos’A 


B. The track is parametrized as in Eq. (4.4): 


qı = V, q = dv/du, q3 = dw/du, q4 = v, qs = w. 
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(4.86) 


(4.87) 


(4.88) 


(4.89) 


(4.90) 


(4.91) 


(4.92) 


(4.93) 
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1. Compute the direction vector a and the momentum p of the track: 


a =(q2,93,1)", p=1/lail- (4.94) 
2. Compute the effective thickness ¢ and the variance of the projected multiple 
scattering angle as in Eqs. (4.85) and (4.86). 
3. Extract the covariance matrix D of (qo, FILE 


D = C (2:3,2:3). (4.95) 


4. Augment D by the contribution of multiple scattering [1] and modify C : 


l-c42 943 
C (2:3,2:3) = D + var [0p] - (L + q2 + »( 2 . (4.96) 
5 me 9293 1-42 


C. The track is parametrized as in Eq. (4.5): 


q 
a= P. q2-— $0, q3 — À, qa — X1, 95 — y1. (4.97) 


1. Compute the unit direction vector a and the momentum p of the track: 
a = (cos $ cos A, sin $ cos À, sina)! ‚p=1/\qı|. (4.98) 


2. Compute the effective thickness ¢ and the variance of the projected multiple 
scattering angle as in Eqs. (4.85) and (4.86). 
3a. If sind = 0, compute the covariance matrix D of (a2, a3)": 


D-2J.C(232:3)- J", with 


_ feosdcosA 0 
= ( 0 cos p) ` i 


Augment D by the contribution of multiple scattering as in Eq. (4.88) and 
modify C : 


C (2:3,2:3) = J7! - D' - (J7!)", with 


-ı _ (1/cosġ cosà 0 
= ( 0 dea ne 
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3b. If sind Z 0, compute the covariance matrix D of (a1, a3)": 
D = J - C (2:3,2:3)- J", with 


J= e cosà — en . (4.101) 
0 COS À 


Augment D by the contribution of multiple scattering as in Eq. (4.91) and 
modify C : 


C (2:33:33 = J7! - D' (J-)!, with 


poh = (~H/sing cosa — cos ósinA/sin cos” (4.102) 
a 0 1/cos à l 


Thick Scatterers 

A thick scatterer can be treated in two ways: sliced into a number of thin scatterers 
or considered a single scatterer. The first approach gives more precise results, in 
particular when the scatterer is magnetized and the incoming particle is expected to 
be strongly deflected and suffer significant energy loss. A typical example for this 
situation is a low-momentum muon crossing the instrumented iron return yoke of 
the CMS experiment [20]. 


In the second approach, the uncertainties of both the direction and the position 
at the exit point have to be increased. In addition, energy loss may have to be 
considered, and an estimate of the actual track length L in the scatterer must be 
available. If no estimate of L is returned by the track propagation algorithm, L can 
be approximated by the distance between the entry and the exit point times a safety 
factor that depends on the momentum and can be tuned by simulation. Alternatively, 
L can be determined from the most probable trajectory in the scatterer [21]. 

Assume that the track parameters and their covariance matrix at the exit of the 
scatterer are given by q and C in the curvilinear parametrization; see Eq. (4.97). 


1. Extract the entry and exit momenta pı and p2. Under the assumption that the 
momentum p drops linearly between the entry and the exit, the average of 1/p? 


is 1/(pı p2). 
2. Compute the variance per unit length of the projected multiple scattering angle: 


1 > 1.85-10-4-t 
t= ==, 009 = mm 


(1 + 0.038 In i. (4.103) 
Xo pı p2 


3. Extract the covariance matrix D of (q2, 93, q4, a): 


D = C (2:5,2:5). (4.104) 
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4. Compute the unit direction vector a in the global coordinate system: 
a = (cos $ cos A, sin $ cos A, sind) . (4.105) 


5. Transform a into the curvilinear system by rotating a into a’ = (0, 0, WIE 


ai?a3 + ar” a, a2 
1— az? 1] + a3 
a' = Ua, with U = aja ay +araz . (4.106) 
a 
1 +a3 1 — a3? 2 
aı a2 a3 


6. Compute the joint covariance matrix E of q' = (aj, a5, x1, yı) [1]: 


L 0 LP 0 

0 L 0 Pp 
E-—o. 4.107 
1712/2 0 13/3 0 an 


0 2 0 PR 


N 


7. Rotate the direction part of E back into the global system: 


a1?a3 + a am 0 
1 — a3? 1+ a3 
E —T.E.T!, with T= an Inch (4.108) 
1+a3 1 — a3? 
0 0 10 
0 0 01 


8. Transform the direction cosines aı, a» back to $, A: 
E"=J:E.J', with 


—sind/cosX cos®/cosi 0 0 
—cosó/sinAÀ —sinó/sinA 0 0 


J= 0 0 vol: if 740; and (4.109) 
0 0 01 
—sing coso 0 0 
0 0 00 i 
= fà =0. 4.11 
d o d: ori nn 
0 0 01 


9. Augment D by the contribution of multiple scattering and modify C : 


C (2:5,2:5) = D+ E". (4.111) 
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4.5.2 Energy Loss by Ionization 
4.5.2.1 Mean Energy Loss 
A heavy, charged particle passing through material suffers loss of energy due to 


ionization of the material. The mean energy loss of the particle is given by the 
Bethe-Bloch formula [14]: 


dE _ "T Zp ls 2mec? B? y? Winax 
2 D? 


gp ? (4.112) 
ds * AP? i l 


2 

where K is a constant, z is the charge number of the incoming particle, Z and A are 
the atomic number and atomic mass of the material, p is the density of the material, 
me is the electron mass, Wmax is the maximum energy transfer in a single collision, / 


is the mean excitation energy, and ô is the density effect correction. For an incoming 
particle of mass M, Wmax is given by: 


2m,c? bB? y? 
1 +2yme/M + (me/M)? 


Wmax = (4.113) 


For light particles such as electrons and positrons, the Bethe-Bloch formula needs 
some modifications. An alternative expression for light particles is [1]: 


dE Zp 2m,c? 
=-K In + 1.5Iny — 0.975). (4.114) 
ds A I 


4.5.2.2 Ionization Energy Loss in Track Propagation 


In tracking detectors, material layers traversed during propagation are often con- 
sidered as discrete. Knowing the thickness of the traversed layer, the Bethe-Bloch 
formula is used to modify the momentum part of the track parameter vector by the 
expected change before propagating to the next layer. Fluctuations in the ionization 
energy loss are often considered so small that they are neglected. 

For detectors such as electromagnetic or hadronic calorimeters, however, ion- 
ization energy loss takes place continuously during propagation. In this case, it 
is possible to augment the vector of global track parameters in Eq. (4.55) with 
a parameter containing the track momentum [3, 22], e. g., € = kw as defined 
in Sect. 4.4.2: 


u=(x,y,z,8)), i (tty t), (4.115) 


m 


where Æ is an auxiliary parameter corresponding to the integrated change in £. 
Track parameter and Jacobian matrix propagations can thereby be carried out in 
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way very similar to the one described in Sect. 4.4.2, effectively including effects of 
energy loss continuously while traversing the detector. 


4.5.3 Energy Loss by Bremsstrahlung 
4.5.3.1 Mean and Distribution of the Energy Loss 


High-energy electrons lose energy in a material mainly by bremsstrahlung [14]. The 
dependence on the material can be summarized in a characteristic length, called the 
radiation length X». It is defined as the average distance over which a high-energy 
electron loses 1 — 1/e ~ 63% of its energy. Although it is not strictly identical to 
the radiation length that is characteristic for multiple scattering (see Sect. 4.5.1), the 
same length is used in both contexts for the sake of convenience. 

The rate of the mean energy loss by bremsstrahlung is nearly proportional to the 
energy: 


e| e (4.116) 


and on average, the energy decreases approximately exponentially as a function of 
the reduced path length t = s/ Xo: 


E[E(t)] = Eo exp(-t), (4.117) 


where Eo is the initial energy. The energy loss is subject to large fluctuations, as a 
substantial part of the electron energy can be carried away by a single photon. A 
simplified PDF of E as a function of t has the following form, called the Bethe- 
Heitler model [23]: 


— [In(Eo/ E] / 271 
h(E) — ET (t/1n2) ; (4.118) 


The PDF can be rewritten in terms of the remaining energy fraction z = E/ Eo: 


u (—Inz) t/In2-1 
f@) = —TG/a2 (4.119) 


It can be seen that — In z is Gamma-distributed. The expectation and the variance 
can be computed explicitly [24]: 


E[z] = exp (—t), var[z] = exp (—t1n3/1n2) — exp (—21). (4.120) 
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Fig. 4.8 Probability density function of the Bethe-Heitler model of bremsstrahlung in Eq. (4.119), 
with mean u and standard deviation o, for t = 0.2, 0.1, 0.05, 0.02 


Figure 4.8 shows that the shape of the PDF is very far from being Gaussian; thus, 
the distribution cannot be adequately described by merely its mean and variance. 


4.5.3.2 Approximation by Gaussian Mixtures 


For electron reconstruction with the Gaussian-sum filter (GSF; see Sect. 6.2.3) the 
model PDF in Eq. (4.119) is approximated by a normal mixture PDF with ne 
components. The parameters of the mixture are determined by minimizing some 
measure of distance between the two distributions. In [25], two distances have been 
used: Dr, the Kullback-Leibler distance, and Dcpr, the integral over the absolute 
difference of the respective cumulative distribution functions (CDFs): 


1 
px. - f In f (z)/8(z)] f(z) dz, (4.121) 


oo 
Dcpr =f | F(z) — G(z)| dz, (4.122) 
—oo 
where f(z) and F(z) are the PDF and CDF of the model distribution, and g(z) and 
G (z) are the PDF and CDF of the normal mixture, respectively. Using Dg, and 
n, = | the single Gaussian with the correct first two moments is recovered. In all 
other cases, the mixtures do not have the same moments as the model. The quality 
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of 


the approximating mixtures has been investigated in detail in [25]. Software in 


the form of a MATLAB® function can be downloaded from the URL in [26]. 
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Chapter 5 A 
Track Finding m 


Abstract There is no systematic theory of track finding yet. Therefore, the first 
section of this chapter presents a list of basic techniques which have been success- 
fully used, stand-alone or in combination, in past and present experiments. Among 
them are the conformal transformation, the Hough and the Legendre transform, 
cellular automata and neural networks, pattern matching, and track following by 
the combinatorial Kalman filter. The following section gives a brief excursion into 
online or real-time track finding in the collider experiments CDF, ATLAS, and CMS. 
As track finding in most cases delivers some candidates that do not correspond 
to actual particle tracks, the concluding section discusses methods for an efficient 
selection of valid candidates. 


5.1 Basic Techniques 


5.1.1 Conformal Transformation 


Circles in the plane passing through the origin can be transformed into straight lines 
by the following mapping [1]: 


- n (5.1) 
“= =; VU = > , 
x2 + y? x2 + y? 
The mapping is conformal, i.e., preserves angles between curves. Assume that a 
circle is defined by the equation: 


(xa + O-b? = R = +b. (5.2) 


Expansion of the left hand side and division by x? + y? gives a linear equation in u 
and v: 


2au + 2bv = 1. (5.3) 
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This is the equation of a straight line in the (u, v)-plane with distance d = 1/(2R) 
from the origin. A circle with a large radius R or small curvature « is therefore 
transformed into a line that passes very close to the origin (Fig.5.1). In the 
limit of zero curvature, the circle becomes a line transformed into itself by the 
mapping in Eq. (5.1). Both circle finding and circle fitting can be simplified by this 
transformation from circles to straight lines. 


5.1.2 Hough Transform 


The Hough transform [2] is a technique that finds clusters of points that lie on 
or close to a parametric curve such as a straight line or a circle. The number 
of parameters is usually two or three. In the simplest case, there is a set of 
points ((x1, y1), .... (Xn, Yn)} in the plane (image space) that lie on a straight line, 
parameterized by y = kox + do, where Ko is the slope and dọ is the intercept 
of the line. A line passing through (x;, y;) fulfills the equation d = —x;k + yi, 
which is the equation of a line £; in the parameter space (Hough space) of k and 
d. The point of intersection of two lines £;, £; is just (ko, do), the parameters of 
the original line. Finding straight lines in the image space is therefore equivalent to 
finding intersection points in the Hough space. 

In practice, the measured points do not lie exactly on a straight line, and the lines 
in the Hough space do not intersect exactly in a single point. The usual approach is 
to define a binning in the Hough space and count the number of lines crossing each 
bin. Peaks in the 2D histogram correspond to lines that are close to many points in 
the image space. The size of the bins depends on the distribution of the measurement 
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Fig. 5.1 Conformal transformation of the circles through the origin in the (x, y)-plane (left) into 
lines in the (u, v)-plane (right) 
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errors and can be tuned on simulated tracks. Alternatively, a 2D binary search can 
be performed using a quadtree data structure [3, 4]. 

The parametrization of the lines by y = kx + d can be numerically problematic 
if very large values of the slope k are possible. A more robust parametrization of the 
line has the form x cos o + y sing — c = 0. The curve in the Hough space of p and 
c passing through the point (x;, yj) is a sinusoid with the equation 


c = xj cos Q + yi sino = r; sin (o + gj), with 


ri =x? + y7, qi = arctan (x;/yi). (5.4) 


If the curve to be found in the image space is a circle through the origin, there are two 
possibilities. The problem can be reduced to the straight line case by a conformal 
transformation, or a circle through the origin can be parameterized in the following 
form: 


x? — 2x R cosg + y? — 2y R sing = 0, (5.5) 


where R is the circle radius and ø is the azimuth of the circle center in polar 
coordinates. The curve in the Hough space of x = 1/R and @ passing through 
the point (x;, yj) is a sinusoid with the equation 


2. 
k = camp gi), (5.6) 
I 
with r; and gj as above, see Fig. 5.2. 
If the curve to be found in the image space is a circle in general position with the 
equation 


(x — xc)? + (y — ye)” — R? = 0, (5.7) 


the constraint that the circle passes through the point (x;, y;) defines a second order 
surface in the 3D Hough space of xc, yc, z: 


z — (x; — Xe)? + (yi — Ye)”, with z = R?. (5.8) 


It follows that finding circles requires finding intersection points of surfaces in a 3D 
histogram, which is computationally much more expensive than the same problem 
in 2D. A search in 3D can be based on octrees, the 3D analogues of quadtrees [4]. 

An alternative is the randomized Hough transform [5], that randomly selects 
triplets of points. The center of the circle passing through the triplet and its radius are 
stored in a 3D histogram. Peak finding can be done in 3D or in the 2D histogram of 
the circle centers. After finding a peak, the best circle center and radius is obtained 
by computing the medoid [6] of the entries in the peak bin. 
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5.1.3 Artificial Retina 


The concept of the “Artificial Retina” was introduced in [7]. Similar to the Hough 
transform, it relies on a partition of the track parameter space into cells. Figure 5.3 
shows a simple example with a track in 2D, specified by slope k and intercept d. 

The intensity of a cell is the sum of the responses of its associated receptors to 
the hits that are present in the layer. Assuming a Gaussian response, the intensity 
R(k, d) of the cell centered at (k, d) is given by: 


n 


R(k,d) - Y Y exp (75 /20?) (5.9) 


i=l j=1 


where s;; = yij — (kx; + d) is the distance of hit j in layer i from the ideal 
track position in layer i, and o; is a scale parameter that regulates the width of 
the receptive field in layer 7. Other response functions are of course possible, and 
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5r E 
Cell for k = —0.5,d = 1.5 


s 1 
0.5} 1 .5 | ———— Ideal track 
Track receptors 
Real track hits 
0 
-1 -0.5 0 0.5 1 2 


Fig. 5.3 Left: A cell in the parameter space of (k, d) corresponds to an ideal track with these 
parameters. Right: The corresponding track receptors represent the intercepts of this ideal track 
with the tracking layers. The hits of a real track are close to the track receptors within the 
experimental resolution. (Adapted from [11], by permission of Elsevier) 


their shape and width can be adjusted for optimal performance. As with the Hough 
transform, track candidates correspond to the local maxima of intensity in parameter 
space. 

The artificial retina is eminently suitable for high-speed track finding, as it can be 
highly parallelized and implemented on commercial FPGAs [8, 9]. For applications 
in the vertex locator of LHCb (Sect. 1.6.1.4) and in a test beam, see [10, 11]. 


5.1.4 Legendre Transform 


The Legendre transform is an extension of the Hough transform, used to find com- 
mon tangent lines or tangent circles through the origin to a set C of circles [12, 13]. 
In the context of track finding, the circles in C are drift circles in a drift chamber or 
a drift tube chamber [14], see Fig. 5.4. Assume a drift circle in the plane, with center 
(xw, yw) and radius p. A line parameterized by x cos 9 + y sing — c = 0 is tangent 
to the circle if, and only if, its signed distance from the circle center is equal to +p: 


Xw Coso + y,sing — c = +p; or c= xycosQ + yy sing + pi. (5.10) 


The drift circle in the image space (x, y) therefore corresponds to two sinusoids in 
the Legendre space (o, c), see Fig. 5.4. The further procedure is the same as with 
the Hough transform. 

A circle through the origin can be parameterized by its radius R and the angle 
$, where the circle center is given by x; = Rcos®, y, = Rsin®. Such a circle 
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Fig. 5.4 Top: Image space Image space 
(x, y); bottom: Legendre 6 T T T 
space (p, c). The circled 
point in the Legendre space 
corresponds to the straight 
line in the image space 


touches the drift circle with center (xy, yw) and radius p if, and only if, the squared 
distance of the circle centers is equal either to (R 4- DE orto (R — p): 


2R (+p + xy cos 6 + yw sin ®) = x2 + y2 — p?. (5.11) 


In order to avoid large radii in the limit of the circle approaching a straight line, the 
Legendre space is chosen as (x, ®), where x = 1/R is the curvature of the circle. 
The drift circle again corresponds to two sinusoids in (x, ©): 


B 2(Xp + xw cos ® + yy sin 9) 


K 2 


(5.12) 
xZ +y% -p 


The task of finding a circle through the origin can be reduced to the task of finding 
a straight line if the Legendre transform is preceded by a conformal transformation, 
see Sect. 5.1.1, which transforms the circle into a straight line while transforming 
the drift circles into circles. 
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5.1.5 Cellular Automaton 


A cellular automaton (CA) is a dynamical system where space, time, and variables 
are discrete [15]. It has five fundamental defining characteristics [16]: 


. It consists of a discrete lattice of sites. 

. It evolves in discrete time steps. 

. Each site takes on a finite set of possible values. 

. At each site, the value evolves according to the same deterministic rules. 

. The rules for the evolution of a site depend on a local neighbourhood around it. 


AWN Re 


Probably the best known CA is Conway’s “Game of Life” [17]. Like many, but not 
all, subsequently proposed cellular automata, it assumes that the cells are located 
on a regular 2D rectangular lattice. However, this assumption is too restrictive for 
the application to track finding. Instead, the cells are represented by the nodes of a 
graph, and the neighbourhood of a cell C; is the set of all cells connected by an edge 
to Cj. 

The neighbours of a cell are divided into “inner” and “outer” neighbours such that 
if C; is an inner neighbour of C;, then C; is an outer neighbour of C;. The possible 
states of a cell are the non-negative integers. In practice, the states are bounded by 
some upper limit depending on the number of detector layers. The initial states of 
all cells are set to zero. 

The earliest applications of the CA to track finding are described in [18—21]. 
With the exception of [18], cells are defined as short track segments connecting 
hits in adjacent detector layers; segments that skip a layer are sometimes allowed as 
well [22, 23]. Each cell has an inner hit and an outer hit according to the arrangement 
of the detector layers. Two cells are neighbours if the outer hit of one cell is the inner 
hit of the other cell. In addition, the neighbourhood relation can be further restricted 
by imposing a cut on the angle spanned by the two cells (segments). It is the task 
of the CA to find chains of neighbouring segments that correspond to actual tracks. 
This is achieved by the following rule of evolution. At each time step, the state 
of each cell is augmented by 1 if it has the same state as its inner neighbour. The 
states of all cells are updated synchronously. When no state changes anymore, the 
evolution is stopped. At this point, the state of a cell is the length of the longest 
unbroken chain of segments terminating in this cell. An illustration of the CA is 
shown in Fig. 5.5. 

The actual search for track candidates starts with the cells with the highest state. 
If such a cell has an inner neighbour with a state that is smaller by 1, it is attached to 
the track candidate. This procedure is repeated until a cell with no inner neighbour 
is reached. If, at any point, several neighbours can be attached to the track candidate, 
either the “best” cell according to some criterion is selected, or the track candidate 
is split into two, and each candidate is followed independently. This inevitably 
results in candidates that share hits, requiring a final selection of compatible (non- 
overlapping) candidates. This is the topic of Sect. 5.3. 
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Fig. 5.5 Illustration of the cellular automaton algorithm. It creates tracklets and links, numbers 
them as possibly situated on the same trajectory, and collects tracklets into track candidates. 
(From [24], by permission of Elsevier) 


In tracking detectors with few layers and little redundancy [25], the CA can be 
complemented by prior information stored in a sector map [26, 27]. In this concept, 
the sensors are partitioned into sectors. In order to create a sector map, a large 
training sample of simulated tracks in a chosen angular and momentum region 
is generated. Two sectors are declared as “friends” if a sufficiently large number 
of tracks passes through them without hitting any other sector in between. The 
friendship relation defines an acyclic directed graph that is stored in the sector map. 

In the track-finding phase, the hits are sorted into their sectors, and the segment 
finder is activated. It creates pairs of hits in friendly sectors and keeps those pairs 
(segments) that pass various cuts, which are also stored in the sector map. If two 
segments share a hit such that the outer hit of the inner segment is the same as the 
inner hit of the outer segment, they are passed to the neighbour finder, which applies 
additional cuts, also stored in the sector map. Finally, all surviving segments form 
the cells of the CA. 

Different sector maps for different geometrical or kinematical regions can be 
created and applied sequentially. For instance, high-momentum, high-quality tracks 
can be found first by using a sector map with tighter cuts. After removing their hits, 
a second sector map with less selective cuts can be used to find the remaining tracks. 

Another extension of the CA, termed the 4D CA, is described in [28, 29]. Pairs 
of segments or triplets of hits are accepted only if the time stamps of the hits are 
consistent with the hypothesis that they have been created by the same charged 
particle. For the application of the 4D CA in the CBM experiment, see Sect. 11.2. 
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5.1.6 Neural Networks 


The application of neural networks to track finding was first proposed independently 
in [30] and [31]. In this approach, the network is of the Hopfield type [32], the 
neurons being track segments that connect observations in adjacent or nearby layers 
of the detector; see Sect.5.1.6.1. More recently, the HEP.TrkX pilot project was 
established with the aim to develop deep neural networks for track finding in 
high-multiplicity environments typical for the LHC era [33]. Two deep networks 
are described in Sects. 5.1.6.2-5.1.6.3. A follow-up project, called Exa.TrkX, was 
started with a kick-off meeting in June 2019 [34]; a second workshop was held in 
April 2020 [35]. A reference to first published results can be found in Sect. 5.1.6.3. 


5.1.6.1 Hopfield Network 


A Hopfield network is a fully connected network with a single layer of neurons. In 


the simplest case, the neurons are binary with two states: s; = +1, i = 1,...,n. 
Each pair (i, j) of neurons has a fixed connection weight w;; with w;; = wj; and 
wii = 0. The states of the neurons evolve in discrete time steps according to the 


following prescription: 


si(t) = sign] X wijs;a-D|. (5.13) 
j=l 


The update can be synchronous (the states are recomputed in parallel) or asyn- 
chronous (the states are recomputed sequentially). The network has an associated 
function E (s), defined as: 


1 
E(s)= EC Ds Wij Si Sj, (5.14) 
i,j=1 
where s = (51, ..., Sn) is the state of the network. In analogy to the theory of spin 


glasses, E (s) is called the energy function of the network. It can be shown that E (s) 
is a non-increasing function of the time ¢ and that the update rule Eq. (5.13) leads to 
a local minimum of E (s) [36]. 

In most applications, including the one discussed here, the aim is to find the 
global minimum rather than a local one. To this end, thermal noise is introduced 
in the network. At temperature 7, the state s is Boltzmann distributed with the 
probability function 


P(s) = m [-E(s)/T], with Z = y epI-EG)T]. (5.15) 
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As the number of possible states rises exponentially with the number of neurons, 
the partition function Z is computed in the mean-field approximation [37], and the 
thermal average v; of s; is given by: 


(si) tanh a (5.16) 
vj = (s)r = tanh | -— — J, ; 

i i/T T àv; 

where the states v = (v1, ..., Un) are now continuous in the interval (—1, 1). The 


definition of the energy function is analogous to Eq. (5.14): 


1 n 
E(v) — > 2. Wij Vi Vj, (5.17) 
i,j= 


and the update is modified accordingly: 
1 n 
v;(r) = tanh | = wi vj(t—1)]. (5.18) 
j= 


Finding the global minimum or at least a low local minimum of the energy function 
is facilitated by deterministic annealing [38]. First, the energy function is minimized 
at high temperature; the temperature is then lowered according to a predefined 
cooling or annealing schedule. At low temperature, the states of the network are 
close to either 1 (active) or — 1 (inactive). 

For the purpose of track finding, the problem has to be mapped on the Hopfield 
network such that the final state of the network corresponds to the solution of the 
problem. Similar to the CA, the neurons are short track segments connecting space 
points in adjacent or nearby layers of the tracking detector. To keep the number of 
neurons manageable, geometric cuts ensure that only segments that can be part of an 
actual track in the momentum range of interest are used as neurons. The sector map 
introduced in Sect. 5.1.5 can be used to store the cuts. A track can be considered as 
an unbroken chain of segments, so only pairs of segments sharing a point qualify for 
having a positive weight. Consider a triple of points (k, l, m) in consecutive layers, 
defining two neurons vg and vj. The weight Wgm depends on the angle between 
the segments and on their length. In [31], it is defined as follows: 


cos* Ok 


————, (5.19) 
dij + dim 


Wkim = 


where à is an odd exponent and dj; is the length of neuron v;;, either in space 
or in the projection to the bending plane. This definition of the weights favours 
combinations of short segments with small angles. Small weights can be set to zero. 

Constraints on the possible final configurations of the active neurons can be 
included by adding a cost or penalty term to the energy function of the network. This 
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serves to prevent association of a point to several tracks and to get approximately 
the expected number of active neurons in the final state. For example, in [39] the 
following cost term was used: 


C(v)-a| Yl vein Yl vem | — 5. (5.20) 


k,l,n,I#n k,l,m,kZ£m 


where o is a Lagrange multiplier and b is a small constant bias term. Minimizing 
the first term leads to a competition between neurons starting or ending in the same 
point, so that in the end at most one should survive. The final update equation is 
given by Eq. (5.16). 

For an evaluation of the performance of the Hopfield network on real data from 
an experiment at the LEP electron-positron collider, see [39]. For an application to 
tracking in the ALICE experiment at the LHC, see [40]. 


5.1.6.2 Recurrent Neural Network 


The dynamics of a particle track can be modelled by a nonlinear state-space 
model; see Sect. 3.2.3. A track is, thus, very similar to a discrete time series, the 
main difference being that the observations are not progressing in time but in 
space. Recurrent neural networks (RNNs), in particular networks of long short-term 
memory (LSTM) neurons [41, 42], are routinely used for forecasting time series 
arising in various contexts: financial, industrial, weather, traffic, etc. It is, therefore, 
to be expected that RNNs can be trained to learn the dynamics of a track and follow 
it in a way similar to the extended Kalman filter. The crucial difference is that in 
the Kalman filter, the dynamics is explicitly coded in the system equation and in the 
distribution of the process noise, whereas in the RNN, the dynamics is implicit in 
the weights learned by the network on a training set of examples. 

The first successful attempt to train an RNN for track finding is described 
in [43]. Two models are presented. The first is a sequential hit predictor that, 
given a sequence of past hits, predicts the position of the next hit. The second 
model augments the first one by predicting the covariance matrix of the hit, 
using a Gaussian distribution. Both predictors are implemented as an LSTM layer, 
followed by a fully connected (FC) layer. The scheme of the Gaussian predictor is 
shown in Fig. 5.6. The training data for the RNN network and the GNN network 
in Sect. 5.1.6.3 were generated by the ACTS package [44]. 


5.1.6.3 Graph Neural Network 


Like the CA described in Sect. 5.1.5, track finding with the graph neural network 
(GNN) is based on the representation of the tracking data by a graph [43]. The 
detector hits are the nodes (vertices) of the graph, and two nodes are connected by 
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Fig. 5.6 Diagram of the Gaussian hit predictor model that takes a sequence of 3D coordinates 
as input and produces bivariate Gaussian probability distributions as next-step predictions. The 
architecture is the same as the basic hit predictor, but the model provides additional output that 
parameterizes the Gaussian covariance matrix. (Adapted from [43], by permission of the author) 


Fig. 5.7 Diagram of the Graph Neural Network model which begins with an input transformation 
layer (IL) and has a number of recurrent iterations of alternating EdgeNet (EN) and NodeNet 
(NN) units. In this case, the final unit is an EN, making this a segment classifier model. (Adapted 
from [43], by permission of the author) 


an edge if they are compatible according to some criterion. Such a criterion can 
be stored in a sector map as the one described in Sect.5.1.5. The graph is fed into 
the GNN that consists of an input transformation layer (IL) followed by alternating 
units of edge networks (ENs) and node networks (NNs), each implemented as a 
multi-layer perceptron with two layers, see Fig. 5.7. 

An EN computes a new weight for every edge from the features of the end nodes 
while an NN computes new features for every node from the current features and 
the edge-weighted aggregated features of all connected nodes in the adjacent layers. 
The network can be used to classify whether nodes/hits or edges belong to a track or 
not. The network in Fig. 5.7 classifies edges, as the last unit is an EN. An extension 
of the work in [43] to track seeding and hit labelling with GNNs is described in [45]. 


5.1.7 Track Following and the Combinatorial Kalman Filter 


In track finding methods such as the Hough transform or the CA, the complete set 
of all hits serves as the primary input. Such methods have, therefore, been dubbed 
“global” [46]. This is in contrast to methods that find tracks locally or sequentially, 
one after the other. The most prominent example of a sequential method is track 
following. 

In track following, a track candidate starts from a “seed”, i.e., a short track 
segment. This seed can in principle be anywhere in the tracking detector. Generating 
seeds in the outer layers of the trackers has the advantage of smaller occupancy and 
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less background from low-momentum tracks that spiral in the inner layers of the 
tracker. Generating seeds in the inner layers, which are frequently pixel layers, has 
the advantage of using 3D hits with higher resolution both in the bending plane 
and in the longitudinal direction. As will be described in more detail in Chap. 10, 
ATLAS and CMS, the two general-purpose experiments at the LHC, have opted for 
the second solution. 

The generation of seeds is often a simple combinatorial search for compatible 
triplets or quadruplets of hits, potentially assisted by a CA [47], and includes 
information about the size and position of the beam spot; see Sect.7.1. Some 
examples of seed generation algorithms will be given in Chap. 10. 

Once the seeds have been found, each seed is then followed through the tracker 
by extrapolating it toward the outside of the tracker or toward the production region, 
depending on where the seed is situated. After each extrapolation step, compatible 
hits are searched for and attached to the track candidate. 

The progressive track recognition described in [48] can be extended to a com- 
binatorial Kalman filter (CKF), introduced in [49, 50] under the name “Concurrent 
Track Evolution”, see Fig.5.8. First, each seed is fitted with one of the methods 
described in Chap. 6. The parameters and the covariance matrix of the seed are then 
extrapolated to the nearest tracker layer, taking into account interactions with the 
detector material; see Sects. 4.3, 4.4 and 4.5. The hits in the sensor in which the 
extrapolated trajectory intersects with the layer are tested for compatibility with the 
predicted track parameters using a chi-square statistic; see Sects. 3.2.3 and 6.1.2. If 
n compatible hits are found, n copies of the predicted state, i.e., its track parameters 
and its covariance matrix, are generated, and each one of them is updated with one 
of the n hits according to the Kalman filter, Eqs. (3.29) and (3.30) or Eqs. (3.31) and 
(3.32). The original state is also kept and marked as having a missing hit, giving a 


Fig. 5.8 Schematic view of concurrent track evolution in a five-layered part of a tracking system 
with hexagonal drift cells, which is traversed by three particles, labelled T1, T2 and T3. The 
simulated drift time isochrones are indicated by circles. The propagation proceeds upstream from 
the right to the left and starts with a seed of hits from track T1 outside of the picture. (From [49], 
by permission of Elsevier) 
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total of n + 1 track candidates. This procedure is iterated on each track candidate 
until the last layer of the tracker is reached or the count of missing hits in a candidate 
exceeds a preset threshold, typically one or two. 

In the course of the combinatorial Kalman filter, it may be necessary to limit the 
number of active candidates for reasons of memory and/or speed. In this case, the 
“worst” track candidates are discarded and not followed anymore. The quality of a 
track candidate can be measured by a combination of its total number of hits, its 
number of missing hits, and its total chi-square Xe (Sect. 6.4). The tuning of the 
combination is usually performed on simulated data, where the correct association 
of hits to tracks is known. 

Avoiding a combinatorial explosion is an important issue in experiments with 
high track multiplicities. Therefore, the CKF in, for instance, ATLAS and CMS 
starts with seeds in the pixel detector with its very high resolution in all three spatial 
dimensions. As a consequence, the compatibility test in the first non-pixel layer 
rejects wrong hits with a high probability. As the CKF proceeds, the state becomes 
more precisely known, and the probability of attaching a wrong hit becomes even 
smaller. For more details, see Sects. 10.2 and 10.3. 


5.1.8 Pattern Matching 


Pattern or template matching is mostly used in real-time track finding for the purpose 
of triggering on charged tracks (see Sect. 5.2). It can be applied to detectors with 
a layer structure, each layer being segmented into sectors or bins [51, 52]; see 
also Sect. 5.1.5. A charged particle crossing the detector generates hits in certain 
bins of the layers, thereby creating a pattern of “on” and “off” bins, which can be 
coded as strings of zeros and ones. The set of physically meaningful patterns is 
generated by extensive simulations and stored in a pattern bank. 

The number of patterns to be stored depends on the geometry and size of the 
detector, on the characteristics of the tracks to be found, and on the granularity 
of the binning. For the purpose of triggering a lower bound on the momentum is 
usually imposed; therefore, only the patterns of tracks with momentum above the 
threshold need to be stored. The granularity of the binning determines how well two 
nearby tracks can be separated, and therefore depends on the occupancy of the layer. 
If the binning is very coarse, for instance, only one bin per module in a silicon strip 
tracker, fewer patterns have to be stored, but two nearby tracks cannot be resolved 
(see Fig.1 in [52]). If the binning is very fine, for instance, a bin for each strip in 
the extreme case, nearby tracks can be resolved almost perfectly, but the number 
of patterns is far too large to store. Also, higher track multiplicity implies smaller 
track separation on average, which in turn requires finer granularity. In any case, the 
granularity has to be optimized by extensive simulation studies. 

In an event, a particular configuration of hits is generated and translated into a 
pattern. This pattern is compared to all patterns in the bank, and matching patterns 
constitute track candidates. The sketch in Fig. 5.9 shows an example of a pattern 
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Fig. 5.9 Top: Two tracks in a Two tracks in an event 
detector with four layers 
creating two patterns. 
Bottom: Four patterns in the 
pattern bank 
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generated by two tracks and some patterns stored in the pattern bank. To cope with 
inefficiencies of the tracking detector, it may be necessary to accept partial matches. 

As the number of patterns that have to be stored can be very large, the method is 
feasible only if there is sufficient memory and if the comparison can be made very 
fast and in a highly parallel mode. The matching has therefore to be implemented 
in VLSI hardware, using content-addressable or associative memories [51, 52]. The 
pattern can be arranged in a tree structure, starting with coarse granularity of the 
sectors and proceeding to finer granularity. It is also possible to store patterns with 
variable resolution [53]. 

Pattern matching was used for real-time track finding both in the vertex detector 
and in the drift chamber of the CDF experiment; see Sect. 5.2.1. Later applications 
include the FTK (Fast Track Trigger) for the ATLAS experiment, see Sect. 5.2.2, 
and a proposed track trigger for the new CMS tracker that will be installed for the 
HL-LHC, see Sect. 5.2.3. 
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An early proposal for online track finding by dedicated hardware is the one 
described in [52]. It is based on matching hit patterns in the tracking detector with 
a pattern bank stored in associative memory; see Sect. 5.1.8. As field programmable 
gate arrays (FPGAs) were still in their infancy at the time of the publication, the 
associative memory (AM) is an array of 400 custom VLSI chips [51] that can hold 
O(10°) patterns [51]. The pattern matching is organized as a tree search through 
different levels of spatial resolution [52]. This was soon followed by the actual 
implementation in the CDF experiment at the Tevatron collider. 


5.2.1 CDF Vertex Trigger 


The Silicon Vertex Tracker (SVT, [54]) was designed to provide track impact 
parameter information for the level-2 trigger of the CDF experiment [55]. It was 
realized in custom hardware [56, 57]. Track finding is done by an AM, refining the 
information from the XFT track processor (see below) that finds tracks in the central 
drift chamber for the level-1 trigger. Tracks are fitted by a farm of digital signal 
processors using a linearized fit that requires only scalar products; see Sect. 6.1.6. 

The upgraded SVT, now renamed Silicon Vertex Trigger, is described in [58, 59]. 
If an event passes the level-1 trigger, the SVT extrapolates the XFT tracks, associates 
hits in the silicon vertex detector, and computes the transverse impact parameter. Its 
average latency is 24 us. The hit association is performed by custom AM chips, the 
linearized track fit in FPGAs. 

The eXtremely Fast Tracker (XFT, [60, 61]) is a track processor that finds tracks 
with high transverse momentum in the central drift chamber of the experiment [62]. 
It is highly parallel and reports its results every 132 ns, in time for the trigger level-1 
decision. The XFT works with hits in the four axial superlayers. Track identification 
is done in two stages called the Finder and the Linker. The Finder searches for 
high- py segments in each of the superlayers, and the Linker searches for high- pr 
track candidates by combining segments from at least three (out of four possible) 
segments. Both stages use pattern matching to accomplish their tasks. 


5.2.2 ATLAS Fast Tracker 


The Fast Tracker (FTK) system of the ATLAS experiment is designed for global 
track reconstruction after each level-1 trigger [63, 64]. It enables the level-2 trigger 
to gain rapid access to tracking results. The system is based on the Silicon Vertex 
Trigger of the CDF experiment; see the preceding subsection. It uses hit data from 
four pixel layers and from both sides of four silicon strip layers, twelve in total. The 
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tracker volume is split into 64 regions or towers, which are processed independently. 
The sensors are divided into “superstrips” with a coarser resolution. 

Data processing starts with clustering in the pixel and in the strip sensors. The 
clustering algorithm is optimized for execution in an FPGA [65]. After clustering, a 
track is represented by a list of superstrips that corresponds to a pattern in the custom 
AM chip [66]. Pattern matching produces track candidates at coarse resolution; 
these are then refined by a high resolution track fit in an FPGA. Missing layers are 
allowed in both stages. The number of patterns that have to be stored is currently 
in the order of a billion. This number would have to rise by a factor of the order of 
10 at the HL-LHC [63], because of the larger number of channels in the tracker and 
of the higher track multiplicity, which requires finer granularity (see Sect.5.1.8). 
For this and other reasons, the FTK will not be upgraded for operation at the HL- 
LHC; instead, the focus will be on the acceleration of the track reconstruction 
software [67, 68]; see also Sect. 10.2. 

The fitted tracks are sent to the Second Stage Board wherein they are extrapolated 
to the remaining silicon layers and fitted again. Finally, duplicate tracks are removed 
based on the number of common hits and x? [64]. 


5.2.3 CMS Track Trigger 


Starting in 2026, the luminosity of the LHC is expected to increase by a factor of 
about ten above the current design value. The current CMS silicon tracker, having 
been in operation since 2009, cannot withstand the radiation level predicted for the 
HL-LHC and has no triggering capability. It will, therefore, be replaced by a newly 
designed tracker [69]. The new design features so-called pr modules as the basic 
sensing devices. A pr module consists of two closely stacked sensors, either a pixel 
and a strip sensor (PS) or two strip sensors (25); see [70] and Fig. 5.10. A charged 
particle crossing the stack generates a stub that consists of two clusters. Tracks of 
sufficiently high transverse momentum (pr > 2 GeV) have little curvature and a 
small offset in the sensor stack, in contrast to tracks with smaller pr, which are bent 


Fig. 5.10 A pr module of " 
the new CMS tracker. fa il 
High-momentum tracks pass 
the pr cut, low-momentum 
tracks fail. (From [70], 
reproduced under License 
CC-B Y-3.0) 
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more strongly and have a larger offset. Stubs that pass the cut on pr are the input to 
the track trigger. 

Three concepts are explored for reconstructing tracks at the level-1 trigger, two 
using an all-FPGA system [71], the third one using a combination of AM and 
FPGAs [72]. 


5.2.3.1 Time Multiplexing 


The all-FPGA system is based on the principle of time multiplexing [73]. The 
fundamental idea is that several sources send their information from a given bunch 
crossing to a single destination for processing. The architecture of the system has 
two layers: the first extracts and preprocesses the stubs and sends them to the second 
layer, which contains the track finding processors. 

Two track finding algorithms are investigated in the time-multiplexed track 
trigger, tracklets and Hough transform. The tracklet algorithm has the following 
stages: 


1. Stub organization: The stubs are sorted into sectors in $. 

2. Seeding: Tracklets are formed from stubs in adjacent layers. 

3. Projection: Tracklets are projected to other layers to search for matching stubs, 
both inside-out and outside-in. 

4. Fit: Track fit of stubs matched to the tracklet. 

5. Duplicate removal: Candidate selection based on x?. 


The second algorithm [71] has the following stages: 


1. Hough transform: Stubs on the same trajectory are transformed into lines that 
meet in the vicinity of a single point; see Sect. 5.1.2. 

2. Fit: Combinatorial Kalman filter, see Sect. 5.1.7. 

3. Duplicate Removal: Tracks are removed whose parameters do not correspond to 
the bin of the Hough transform where they were found. 


5.2.3.2 Pattern Matching 


Pattern matching is done in parallel in 48 regions called trigger towers [72]. Each 
tower has two boards for pattern recognition and track fitting. Pattern recognition 
is done with lower resolution data, which are compared to predefined patterns by 
a content addressable memory. Only patterns corresponding to tracks with pr > 
2 GeV are stored. If a match is found, the corresponding high-resolution data are 
sent to the track fitting module. The linearized track fit (Sect. 6.1.6) runs in the 
FPGA and computes the helix parameters and the x. 
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5.3 Candidate Selection 


After track finding, track candidates may share hits. If two candidates share more 
hits than is deemed acceptable, for instance more than one, the track candidates 
are called incompatible. The incompatibility relation can be represented by an 
undirected graph (V, E), where the n vertices v; € V, i = 1,...,n are the track 
candidates. Two incompatible track candidates v; and v; are connected by the edge 
eij = eji, which is defined as the unordered pair (v;, vj). The number of compatible 
track candidates can be maximized by finding an independent set of vertices of 
maximal size, i.e., a subset Vi C V of vertices, no two of which are connected by 
an edge. 

Alternatively, the graph can represent the compatibility relation, in which case 
two compatible tracks/vertices are connected by an edge. The problem is then to find 
a maximum clique, i.e., a fully connected subset V2 C V of vertices of maximal 
size. 

Both problems are NP-hard [74] so that finding a maximum independent set or 
maximum clique can be very time-consuming for large graphs. A set of C routines 
for finding cliques in a compatibility graph can be found in [75]. The fastest exact 
algorithm for finding independent sets published up to now is the one in [76]. An 
independent set can also be obtained by finding a vertex cover, i.e., a set V3 C V of 
vertices the removal of which leaves an independent set. In [77], it was shown that 
there is a one-to-one correspondence between minimal vertex covers or maximal 
independent sets and steady states of Hopfield networks [32] with nonpositive 
weights. In addition, such a network converges to its steady state in at most 2n 
steps. There may be many minimal vertex covers of different size, and the steady 
state reached depends on the initial state of the network, of which there are 2". 
Finding the minimal vertex cover by an exhaustive or random search of all initial 
states is, therefore, computationally infeasible for large n. 

In any case, finding the largest independent set is not necessarily the best 
approach for finding an "optimal" set of track candidates, as the quality of the track 
candidates (see Sect. 6.4) should be taken into account, too. If the quality of the 
candidate v; is quantified by a positive weight w;, the problem is now to find an 
independent set that maximizes the sum of weights (MWIS). Like its unweighted 
counterpart, the MWIS problem is NP-hard. For a recent approximative solution and 
numerous references to previous work, see [78]. 

If the weight w; is mapped to a quality indicator q; € [0, 1], the network in [77] 
can be generalized to a recurrent network with annealing that aims to find the set of 
compatible vertices with the largest sum of weights [79]; see also Sect. 11.1. 
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Chapter 6 A) 
Track Fitting m 


Abstract Track fitting is an application of established statistical estimation pro- 
cedures with well-known properties. For a long time, estimators based on the 
least-squares principle were—with some notable exceptions—the principal methods 
for track fitting. More recently, robust and adaptive methods have found their way 
into the reconstruction programs. The first section of the chapter presents least- 
squares regression, the extended Kalman filter, regression with breakpoints, general 
broken lines and the triplet fit. The following section discusses robust regression by 
the M-estimator, the deterministic annealing filter, and the Gaussian-sum filter for 
electron reconstruction. The next section deals with linearized fits of space points to 
circles and helices. The chapter concludes with a section on track quality and shows 
how to test the track hypothesis, how to detect outliers, and how to find kinks in a 
track. 


6.1 Least-Squares Fitting 


In this section, three methods are described that are based on the least-squares 
(LS) principle for estimation of the track parameters. They are linear or linearized 
regression, the (extended) Kalman filter, and regression with breakpoints. In the 
case of a strictly linear model, they are mathematically equivalent. With a nonlinear 
model, there may be small differences because of different choices of the expansion 
point(s). In the following, the more frequent case of nonlinear models will be 
described, which contains the linear model as a special case. 


6.1.1 Least-Squares Regression 


Assume that track finding has produced a track candidate, i.e., a collection of 


n measurements mı,...,m,„ in different layers of the tracking detector, along 
with their respective covariance matrices V1,..., Vn. The measurements may have 
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different dimensions m; and usually have different covariance matrices, resulting 
in a heteroskedastic model. The initial parameters of the track to be fitted to the 
measurements are denoted by p. They are assumed to be tied to a reference surface 
(layer 0). The regression model has the following form: 


m= f(p)+e, Efe] 2 0, Var[e] = V, (6.1) 


where m = (mı,...,m,)' and f= Sages io The function f, maps the 
initial parameters p to the measurement m; in layer k. It is the composition of the 
track propagators up to layer k (see Sect. 4.3) and the function that maps the track 
state to the measurement (see Sect. 3.2.3): 


Fic = hko fear ° Feat k—-2 9 ++ © fono fijo (6.2) 
Its Jacobian F, is given by the product of the respective Jacobians: 
Fy = Hy Pee) Fr-ıjk-2 - -- F2 Fijo. (6.3) 


The covariance matrix V is the sum of two parts, V = Vm + Vs. The first part 
is the joint covariance matrix of all measurement errors. These can virtually always 
be assumed to be uncorrelated across different layers, so that Vw is block-diagonal: 


Vu = blkdiag(Vi,..., V4), with V; = Var[ej], i=1,...,n. (6.4) 


The second part Vs is the joint covariance matrix of the process noise caused by 
material effects, mainly multiple Coulomb scattering; see Sect. 4.5. As in Sect. 3.2.3, 
the process noise encountered during the propagation from layer k — 1 to layer k 
is denoted by y, and its covariance matrix by Q,. The integrated process noise up 
to layer k is denoted by Ij. Linearized error propagation along the track gives the 
following expression for the covariance matrix of Ix: 


k 
Var Wi] — 5 Fei Qi Fiji» (6.5) 
i=l 


with 


Fijj = Fk 1Fk 1|k 2: Fiji for i «k and Fkk = I. (6.6) 


Ifi < k, I; and I; are correlated with the following cross-covariance matrix: 


Cov (fh, I] = Y Fi; Q; Fh (6.7) 
j=l 
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Error propagation from the track states to the measurements gives the final block 
structure of Vs: 


diis i H,Var[I] HT, if i=k 
C251 C2 ++- Con . T 
Vs=| . . . . |. with Ci 2 1 HiCov[, D] H], if i ck 
Mu OMS Cl, ifi>k 
Chi Cn2 dis Can ki Ó 


(6.8) 


Estimation of the initial state p is usually done by the Gauss-Newton method; 
see Sect. 3.2.2. A first approximation p of p has to be delivered by the track finder 
to be used as the expansion point to compute the Jacobians in Eq. (6.3). The updated 
estimate p, is obtained via: 


- ` u 7 a 
Pi = Po + (FOG Fo) | F] Gim — f (pl, Fo = Z Br (6.9) 
P | Po 


The corresponding chi-square statistic is given by: 
xi = [m — f (Gl Gim — f(p,)], with G = V^. (6.10) 


It is approximately x? distributed with M — m degrees of freedom, where M = 
dim(G) is the sum of all m;, and m is the number of estimated track parameters, 
usually five. 

The Jacobians are recomputed at the new expansion point p,, and an updated 
estimate p, is computed. This two-step procedure is iterated until the absolute 
difference Xp oT ties xà is below a predefined threshold or a maximal number of 
iterations is reached. 

The p-value of the chi-square statistic (see Sect. 3.2.1) is the primary quality 
indicator of the track fit; see Sect. 6.4. A small p-value indicates a misspecification 
of the model or at least one outlying measurement that does not fit to the track. The 
standardized residuals or pulls can be used to look for outliers. The residuals r of 
the fit are defined by: 


r - m — f(p), (6.11) 


where p is the final estimate after convergence. Their covariance matrix is obtained 
by linearized error propagation and is approximately equal to: 


R = Var[r] ~ V - F(F' GF) | Fl. (6.12) 


Note that R has rank M — m and thus cannot be inverted. The standardized residuals 
s are given by: 


s =r./,/diag(R), (6.13) 
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where ./ denotes the point-wise division of two vectors or matrices (MATLAB® 
convention). They are approximately distributed according to a standard normal 
distribution. Outliers are characterized by an unusually large value of s. 

The residuals r, i.e., the differences of the measured track and the fitted track 
are a superposition of measurement noise and process noise (mainly multiple 
scattering). The vector r of residuals can be further decomposed into two parts 
corresponding to the two types of noise [1]: 


r=rm+rs, with (6.14) 
rM =VmG'm, Ry = Var[rm] = VMG Vw, (6.15) 
rs = VsG'm, Rs = Var[rs] = VsG' Vs, (6.16) 
G' =GRG. (6.17) 


The two noise contributions can thus be checked independently via their standard- 
ized residuals sm and ss, given by: 


sum = ry -/ y diag (Ry) (6.18) 
ss = rs ./ / diag(Rs) (6.19) 


6.1.2 Extended Kalman Filter 


A "progressive" or recursive version of the LS regression for track fitting was first 
proposed in [2]. Soon, it was realized that this was the same as an (extended) Kalman 
filter [3] in the state space model of the track dynamics. The Kalman filter has the 
advantage that only small matrices have to be inverted, that the track fit follows the 
track as closely as possible, and that material effects such as multiple scattering and 
energy loss can be treated locally in each measurement or material layer. In addition, 
the filter can be complemented by the smoother; see Sect. 3.2.3. 

In track fitting with the extended Kalman filter, it is assumed that the trajectory 
crosses a number of surfaces or layers with well-defined positions and orientations. 
A layer can be a measurement layer, a material layer, or both. At the intersection 
point of the trajectory with layer k, the state vector q contains information about 
the local position, the local direction, and the local momentum of the track. The 
uncertainty of the information is specified by the associated covariance matrix Cx. 
Different possible parameterizations of the track state are discussed in Sect. 4.2. 

The Kalman filter is a sequence of alternating prediction and update steps; 
see Sect. 3.2.3. In the prediction step, the estimate q,_, of the track state in layer 
k — 1 is extrapolated to layer k, along with its covariance matrix Cx_1: 


Qi = fica D. Cik-1 = Fer-ıCk-ı Fig (6.20) 
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where fx]x-ı is the track propagator from layer k — 1 to layer k, and Fx \x—1 is its 
Jacobian matrix; see Sect. 4.3. 

The update step is different in material and detector layers. In a material layer, 
multiple scattering is taken into account by inflating the covariance matrix elements 
of the track directions and, in a thick scatterer, of the track position; see Sect. 4.5. 
Energy loss by ionization is taken into account by decreasing the track momentum. 
For the treatment of electron bremsstrahlung, see Sect. 6.2.3. 

The update step in a detector layer is given by Eqs. (3.29) and (3.30) or 
Eqs. (3.31) and (3.32). The associated chi-square statistic X Eq. (3.35), can be 
used to test the compatibility of the observation m; with the predicted state q kIk-1 
in the combinatorial Kalman filter; see Sect.5.1.7. A large value of Xi or a small 
p-value indicates that the observation does not belong to the track. 

In track fitting, the initial state gq is obtained from the track finder and therefore 
contains information from the observations. In order not to use the information 
twice, its covariance matrix is set to a large diagonal matrix. As a consequence, the 
initial chi-square statistics Xi have zero degrees of freedom, until the covariance 
matrix of the state vector has full rank. For instance, if all measurements are 2D, 
and the state is 5D, x : and x have zero degrees of freedom, xà has one degree of 
freedom, all subsequent XQ have two degrees of freedom and the total chi-square 
statistic x2, has 2n — 5 degrees of freedom. 

The smoother can be implemented either according to Eqs. (3.36) and (3.37) or 
by running a second filter in the opposite direction and combining the states of the 
two filters by a weighted mean (Eq. (3.38)): 


-1 E 
T Sis b zb = -1 b 
Ikın = Ck|n |c: qk + (um ilia] QCqQ,—C, + (Chur) ; 


where 4 h x41 ÍS the predicted state from the backward filter and C kI ka] its 
covariance matrix. Alternatively, the predicted state from the forward filter and the 
updated state from the backward filter can be combined. The associated chi-square 
statistic Xj „ (Eqs. (3.39) and (3.40)) can be used to test the compatibility of the 
observation m; with the smoothed state d ,,, using the entire track information. A 
large value of X n or a small p-value indicates that the observation does not belong 
to the track and is an outlier; see Sect. 6.4.2. A simplified version of the chi-square 
statistic X n is used in the deterministic annealing filter; see Sect. 6.2.2. 


6.1.3 Regression with Breakpoints 


Instead of absorbing the effects of multiple scattering in the covariance matrix 
of the process noise, the scattering angles at certain points, called breakpoints, 
can be explicitly incorporated into the track model as additional parameters. At 
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these breakpoints, the scattering angles are estimated [4, 5], using their known 
expectation (zero) and known covariance matrix in the curvilinear parameterization; 
see Sect. 4.5.1. The breakpoint fit is mathematically equivalent both to LS regression 
in the linear approximation of the model and to the Kalman filter, unless the initial 
state of the latter has non-negligible information. 

Let 0; = (91, 92), j = 1,...,m denote two uncorrelated multiple scattering 
angles at the breakpoint j, and Q; their covariance matrix. Then the regression 
model in Eq. (6.1) can be modified to: 


my = f (p91, .-., 04) + Ex, E[ex]=0, Var[ek] = Vi, ko L...,n, 
(6.22) 


where jy is the index of the last breakpoint before measurement layer k. All 
measurement errors €g are now independent, and their joint covariance matrix is 
block-diagonal, as is the joint covariance matrix of all 0 ;. The full regression model, 
which includes the scattering angles as additional parameters, now reads: 


m, f\(p. 01, -.-,9),) 
Mn | = | Fn(PP1,...,0m) eg with (6.23) 
0 1 
0 On 
E[8] —0, Var[3] = blkdiag(V,...,Vn, Q1,---, Qm). (6.24) 


If the functions f j are Taylor-expanded to first order, a linear regression model with 
the following structure is obtained: 


mı Fı Hip- Hj O--- 0 
GENE a oe p 
Mn Fa Hal e cee ee Hm 01 
= . : 6.25 
0 O I1 OQ vee e 0 TE (6.25) 
: Om 
0 O O I 


where Hj, j < jk is the Jacobian matrix of the function that describes the 
dependence of m; on 0; and F is as in Eq. (6.3). 

The following two subsections describe two efficient implementations of the 
breakpoint concept. 
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6.1.4 General Broken Lines 


The general broken lines (GBL) algorithm [6] is a fast track refit based on 
the breakpoint concept. It is particularly useful for track-based alignment and 
calibration with the Millepede-II package [7]. It is assumed that only thin scatterers 
are present, or that thick scatterers are divided into several thin scatterers. The 
algorithm needs an initial trajectory as a reference. 

At each measurement plane and thin scatterer, a local orthonormal coordinate 
system (u, v, w) is defined. The natural choice of the w-axis is perpendicular to the 
sensor plane for a measurement and parallel to the track direction for a scatterer. At 
each thin scatterer, the offset (u, v) is a fit parameter, as is the global signed inverse 
momentum q / p. The prior information on the scattering angles is the same as above, 
obtained from multiple scattering theory; see Sect. 4.5.1. The transformations from 
the curvilinear frame following the track to the local frames are given in Sect. 4.4.1. 

The corrections to the track parameters in a measurement plane depend only 
on the (inverse) momentum and the adjacent offsets; therefore, their estimation 
requires the inversion of a bordered band matrix, the computing time of which is 
proportional to the number of breakpoints. The GBL is available in a dedicated 
software package [8] and is also implemented in GENFIT [9-13]. 

A comparative study in [6] shows that track fitting with the GBL is a little faster 
than the extended Kalman filter and up to three times faster than the Kalman filter 
plus smoother. 


6.1.5  Triplet Fit 


A track fit for situations in which the principal source of stochastic uncertainty is 
multiple scattering is described in [14]. The fit is an independent combination of 
fits to triplets of hits, where the middle point of a triplet is a breakpoint. As the 
triplet fits are fast and can be parallelized, the method is well suited for online 
reconstruction. The fit has been designed for low momentum tracks with several 
turns in the detector, but performs also very well for tracks in a high-resolution 
pixel tracker with momenta up to 10 GeV. For a comparison with a full helix fit 
and the GBL fit (Sect. 6.1.4) see [14]. The fit is implemented in a software package 
called WATSON [15] which is available on request from the authors of [14]. 


6.1.6 Fast Track Fit by Affine Transformation 


Online track reconstruction in the first-level trigger, see Sect. 5.2, requires an ultra- 
fast track fit that can be implemented in high-speed hardware such as FPGAs [16]. 
One possibility to achieve the required speed is to fit an affine model to a training 
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sample of simulated tracks that expresses the track parameters p as an affine 
function of the measurements m [17]: 


p=Am-+e. (6.26) 


The matrix A and the vector c are estimated by minimizing the objective function 


N 
S(A,c) =) (Ami + c- pj) (Ami c — pi), (6.27) 


i=l 


where the p; are the true parameters of the N tracks in the training sample. The 
solution is given by: 


A = | (pm™) - (p) (m™)| C-!, c = (p) - Am), C = (mm) — (m) m. 
(6.28) 


where C is the sample covariance matrix of the measurement vectors, and the angle 
brackets denote the average over the training sample. The goodness-of-fit can be 
judged by the chi-square statistic 


x? = [m — (m)]! C^! [m — (m)], (6.29) 


which is approximately x?-distributed with M — m degrees of freedom, where M — 
dim(m) and m = dim(p). If the track model is exactly linear, i.e., m = F p with 
some matrix F, the matrix A is equal to the Moore-Penrose pseudoinverse of F, 
and the rank of C is m, so that C has m positive eigenvalues while the remaining 
ones are equal to 0. If the linear approximation to the track model is valid in the 
entire training sample, C has m large and M — m small eigenvalues. The training 
sample, therefore, has to be chosen such that this condition is satisfied. In practice, 
this means that the detector volume has to be partitioned into many small regions, 
each with its own training sample. The training automatically takes into account 
material effects, measurement errors, misalignment, and the configuration of the 
magnetic field. 


6.2 Robust and Adaptive Fitting 


6.2.1 Robust Regression 


The LS regression described in Sect. 6.1.1 is not robust in the sense that outliers in 
the track candidate lead to a significant distortion of the estimated parameters. One 
way to cope with this problem is to look for outliers in the standardized residuals of 
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the regression and to remove them; see Sect. 6.4.2. A more elegant way is to make 
the regression robust by minimizing an objective function that is different from the 
sum of squared differences between the model and the measurements. 

Three important approaches to robust regression are Least Median of Squares 
(LMS), Least Trimmed Squares (LTS), and M-estimators [18, 19]. LMS regression 
is difficult to compute in the context of track fitting; it is statistically less efficient 
than either LS or LTS, and there is no simple prescription for the computation 
of the covariance matrix of the estimated parameters. The computation of the 
LTS estimator requires the solution of a combinatorial optimization problem and 
is therefore rather time-consuming. The M-estimator, on the other hand, can be 
implemented as an iterated re-weighted LS regression and is therefore an excellent 
method for a robust track fit. The re-weighting can be done on single components 
of the measurement vectors m; or on entire measurement vectors. 

An M-estimator is characterized by a function w(z) that determines the corre- 
sponding weight function w(z) = Y(z)/z, where z is the standardized residual of a 
measurement. Setting Y (z) = z yields the LS estimator. Table 6.1 and Fig. 6.1 show 
three examples of w and weight functions from the literature. 

The constant c controls the shape of the weight function. The weight function of 
Huber’s M-estimator [19] is equal to one in the interval [—c, c] and slowly drops 
to zero outside the interval (see Fig. 6.1a). Tukey’s biweight [20] is a redescending 
estimator for which the weight function is equal to zero outside the interval [—c, c] 
(see Fig. 6.1b). For the adaptive M-estimator [21], c is the point where the weight 
function is equal to 0.5. This estimator has an additional control parameter T that 
modifies the transition from large to small weight and can be used for implementing 
an annealing procedure; see Sect. 6.2.2. In the limit of T — 0, the estimator is 
redescending, as the weight function approaches a step function that drops from 1 
to 0 at z = c (see Fig. 6.1c). The computation of the M-estimator is summarized 
in Table 6.2. In order to be less sensitive to multiple scattering, it uses only the 
measurement component sm of the standardized residuals; see Eq. (6.18). 


Table 6.1 The y functions and the corresponding weight functions w of three M-estimators. c and 
T are constants. Further explanations can be found in the text 


Name w(z) olz) 
Huber a Se NIIT 
c-sign(z), if Iz|» c c/|z|, if |z| » c 
1— 2/22» if < 1— zZ 212 if < 
Tukey’s biweight al ce y, if lse ( RS E if |z| xc 
0, if |z| 2» c 0, if |z| » c 
z exp(—z?/2T) exp(—z?/2T) 


Adaptive 


exp(—z2/2T) + exp(—c?/2T) exp(—z2/2T) + exp(-c?/2T) 
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Fig. 6.1 The weight functions of the three M-estimators in Table 6.1. (a) Huber type with c = 2. 
(b) Turkey's biweight with c = 3. (c) Adaptive with c = 3 


6.2.2 Deterministic Annealing Filter 


The deterministic annealing filter (DAF) is an adaptive version of the standard 
Kalman filter [22]. It can modify the influence of outlying measurements by 
assigning them a smaller weight. It can also deal with the case that two (or more) 
measurements in the same detector layer are tagged by the track finder as valid 
candidates for inclusion in the track. This is particularly relevant in the LHC 
experiments, given the high track density in the central tracker. It is then up to the 
track fit to decide which of the competing measurements, if any, is most compatible 
with the local track state. Another use case is the choice between two mirror hits in 
a drift chamber. The DAF is implemented in GENFIT [9-12]. 
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The measurements in layer k are denoted by mi, j = 1,..., Mg, and their 


covariance matrices by y! . The DAF is implemented as an iterated Kalman filter 
plus smoother with annealing; see Table 6.3 for the basic sequence and [23, 24] for 
implementation details. In each layer, the state vector is updated with a weighted 
mean of the measurements, using the covariance matrices of the current iteration. 


Table 6.2 Algorithm: Track fit with M-estimator 


TRACK FIT WITH M-ESTIMATOR 


1. Sete = 1076, V =V, Pm = Vy, and compute the LS regression. 
2. Compute the standardized residuals sm in Eq. (6.18). 
3. Compute the weights w; of the components sm, j of sm according to: 


wj =max[e,o(sm,j)], j=1,...,M. (6.30) 


Set Vm = Vy./diag(w) and V = Vy + Vs. 
4. Recompute the LS regression, set V — V and repeat from step 2 until convergence. 


Table 6.3 Algorithm: Deterministic annealing filter 


DETERMINISTIC ANNEALING FILTER 


1. Set e = 1076, set the iteration counter IZ = 0 and Vi(0) = Vi, Jj=|1....,Mk, k= 
1,...,n. Choose a threshold xà and an initial temperature 79. 
2. Run a Kalman filter plus smoother, using the covariance matrices Vi (I) in all updates. 


3. For all layers k = 1,...,n, compute the weights wi of measurements mi, Jm; uM 


j_ exp —X] /27,) 
w, = max | e, ep C X2 M; —XÜDT. > 
p(—x?/2Tk) + 3,2, exp(—X;./2T;) 


xi = Imi = hau]. [vio] B [mi LUI j 


X A is the x?-distance of m] from the smoothed state in layer k in the metric defined by the 
inverse of Vi (0), see Eq. (3.39). 

4. Increase the iteration counter / by 1 and set Vi Wd) = Vi (0)/wi, jz=il,...,Mr k= 
| RR 

5. Set the new temperature Ty according to the chosen annealing schedule and go to step 2. 

6. When the final temperature is reached, iterate until convergence. 
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6.2.3 Gaussian-Sum Filter 


The Kalman filter is (near) optimal if the track model is (approximately) linear, and 
both system and measurement noise are (approximately) Gaussian; see Sect. 3.2.3. 
If the noise is long-tailed or highly asymmetric, the Gaussian-sum filter (GSF) is an 
alternative estimator. It can be applied to a broad class of non-Gaussian distributions 
by allowing all densities involved to be mixtures of normal PDF s or Gaussian sums. 
The GSF can be applied in the following use cases: 


1. Long-tailed measurement errors. The distribution of the measurement errors is 
contaminated by frequent outliers and can be modelled by a mixture of two 
Gaussians, the “core” and the “tails” [25]. 

2. Thin scatterers. The distribution of the multiple scattering angle in thin layers 
is non-Gaussian because of its long tails [26], but can be approximated by a 
Gaussian sum with two components [27]; see Sect. 4.5.1. 

3. Inhomogeneous scatterers. Multiple scattering in an inhomogeneous material is 
often treated by computing an average thickness and an average radiation length 
(Eq. (4.79)). With the GSF, it is possible to describe the angular distribution by a 
Gaussian sum, with one or two components for each type of material [28]. 

4. Energy loss by bremsstrahlung. The distribution of the energy loss caused by 
bremsstrahlung of electrons is very far from being Gaussian. While the Kalman 
filter is restricted to using the first two moments (mean and variance) of the 
distribution, an approximation by a normal mixture allows the GSF to take into 
account more details of the shape of the energy loss PDF [29, 30]; see Sect. 4.5.3. 


In the GSF, the PDF of the state vector can be a Gaussian sum at every surface. First 
assume that the surface is a material surface and that the predicted state vector q at 
the entry of the surface has the following normal mixture PDF with J components: 


J J 
po(q) = Y moa: q; Cj), X m=, (6.31) 
j=l j=l 


where ø (q; qj, Cj) is the normal PDF with mean q; and covariance matrix Cj, j = 
1,..., J. The process noise y in the surface is modeled by a normal mixture as well 
(see also Sect. 3.2.3.2): 


M M 
EV) =J Ome (Ys Em Qu). I, Om = 1. (6.32) 


ml m=1 


Note that both J and M may be equal to one. Then the PDF of the state vector at the 
exit of the surface is given by the following normal mixture with J x M components: 


J M 
pi(q) = 9 >. njono(q:4; + 8m Cj + Qm) (6.33) 
J=lm=1 
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Now assume that the surface is a measurement surface and that the distribution 
of the measurement error of measurement m is modeled by a normal mixture: 


M M 
g(e) = 9 5 on9(6:0, Vm), Yo om = 1. (6.34) 


m=1 m=1 


Again, J and M may be equal to one. The PDF of the updated state vector is given 
by: 


J M 
pi(q) 2 3, nimpa: 4 jm Cjm), (6.35) 


j=l m=1 


with 


J M 
Nim X nj Omp(m; h(q;j), Vm + HC;H'), X Y nim =1. (6.36) 
j=l m=1 


The mean q ;,, and the covariance matrix C jm are obtained by the Kalman filter 
update of component j of the predicted PDF po(q) with component m of the 
measurement error PDF g(e), see Eqs. (3.29) and (3.30) or Eqs. (3.31) and (3.32). 

Finally, assume that there are M measurements mm in the measurement surface, 
with weights œm, m = 1,..., M. In the absence or prior information, all weights 
are set to 1/M. The observation m can then be modeled by the following normal 
mixture PDF : 


M M 
g(m) = ^ onom; mm, Vn), 3 joy — 1. (6.37) 


m=i m=1 


The PDF of the updated state vector is given by: 


J M 
PO =Y Y jmo jm Cim), (6.38) 
j=lm=1 
with 
J M 
Nim X Tj Om Mm; h(qj), Vm + HC;H'), 32 nim =1. (6.39) 


j-2lm-l 


The mean q ;,, and the covariance matrix C jm are obtained by a Kalman filter update 
of component j of the predicted PDF po(q) with component m of the observation 
PDF g(m). The resulting GSF is basically a combinatorial Kalman filter in which 
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each track candidate has an additional weight which shows how likely it is compared 
to the other ones. This version of the GSF can be used for track finding; in this case, 
a missing hit with large errors should be added in each layer [31]. Its weight can 
reflect the hit efficiency of the measurement device. 

In principle, the number of components rises exponentially in the course of the 
GSF, and it is necessary to reduce their number whenever it exceeds a threshold 
K set by the user. The simplest way of reducing the number of components is to 
keep the K components with the largest weights, drop the remaining ones, and 
renormalize the weights to a sum of one. A more sophisticated approach is to search 
for clusters among the components and to collapse the components in a cluster to 
a single Gaussian with the same mean and covariance matrix. Clustering can be 
based on the similarity between components, as measured by, e.g., the Kullback— 
Leibler divergence [30]. For a brief review of clustering algorithms, see Sect. 3.3. 
The choice of the threshold K and the clustering procedure have to be optimized by 
simulation studies. 

Even for moderate values of K, the GSF is significantly slower than the Kalman 
filter; it is, therefore, used mainly for special applications such as the track fit of 
electrons with non-negligible bremsstrahlung [30, 32, 33]. 


6.3 Linear Approaches to Circle and Helix Fitting 


6.3.1 Conformal Mapping Method 


The conformal transformation described in Sect.5.1.1 can be generalized to deal 
with circles passing close the origin [34]. The conformal transformation maps such a 
circle to a circle with a very small curvature, which in turn can be well approximated 
by a parabola: 


1 a RA 2 
v= :u—e " u^, (6.40) 


where e = R — Va? + b? is the impact parameter. A standard parabola fit to the 
measurements in the transformed (u, v)-coordinates yields the parameters A, B and 
C according to 


v=A+Bu+Cu, (6.41) 


and the circle parameters are therefore given by 


1 p? 
b = —, a=-bB, e=-C 


2A | (a2 + b ae 
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using the approximation R ~ Ya? + D? in the expression of e [34]. Following 
Gluckstern [35], it is also possible to obtain expressions of the estimated errors of 
the circle parameters [34]. 


6.3.2 Chernov and Ososkov's Method 


The task of fitting a circular track to a set of measurements is tantamount to 
minimizing the function 


= ds (6.43) 


where d; are measurement residuals orthogonal to the particle trajectory: 


ay =| fe =a)? eo =? = n ion, (6.44) 


where a, b, and R are the coordinates of the circle centre and the radius. The 
approach of Chernov and Ososkov [36] is to simplify this non-linear minimization 
problem by introducing an approximate expression for the residuals d;, 


d; x les — ay + (y - b} — re] /2R, (6.45) 


which holds true with high precision as long as the residuals are small compared 
to the circle radius. The equations obtained by differentiating x” with respect to 
the circle parameters and setting these to zero are quartic (polynomial equations of 
degree 4) and can be solved efficiently by a standard Newton iteration procedure. 


6.3.3 Karimäki’s Method 


Karimäki’s approach [37] starts from the simplified expression of the residuals 
d; introduced by Chernov and Ososkov [36] and considers a x? with weighted 
residuals: 


n 
= y ud (6.46) 
i=l 


The weights can for instance contain measurement uncertainties if they are not the 
same for all measurements (x;, yi). 
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The x? function is minimized with respect to a set of circle parameters with 
Gaussian behaviour: the curvature « (the inverse radius of curvature), the impact 
parameter e (the distance from the origin to the point of closest approach of the 
fitted circle), and the direction $ of the tangent of the circle at the point of closest 
approach. Using this set of parameters, the simplified residuals are expressed as 


1 1 
d; — zen? — (14+ k£)ri sin($ — Qi) + gy € +e, (6.47) 


where r; and ġ; are the polar coordinates of measurement i. The residuals can be 
written as d; = (1 + ke) ni, with 


ni = yri — ri sin(d — $i) + ô, (6.48) 
and 
K 1+xe/2 
= ee 6.49 
Y — 204 «e) lene” a) 


Using these definitions, the x? can be written as 
x? = (14 Ke) x?, (6.50) 
where x? = Yo) wi n. With the approximation 1 + «e © 1, X? can be minimized 


instead of x”, leading to the attractive feature of a set of equations with explicit 
solutions: 


= 5 arctan (2g; /42). (6.51) 
y = (sind - Cy, — cosó - Cyz) / Cs, (6.52) 
ô = —y (z) +sind(x) — cosó(y), (6.53) 


where q1 = C;;C,, — Cy; Cy; and q2 = C; (Cx — Cy) — C2, + C2. and the angle 
brackets () denote a weighted average, e.g., (x) = I; wixi/ Y; wi. The variances 
and covariances of the measurements x, y and z = x? + y? are given by 


C = (x°) — 6, Cay = (xy) - (xy), Cyy = 07) - OY, 


X. 
(6.54) 
Cx; = (xz) — (a2), Cyz = (yz) — OHZ), Cee = (rf) = (2). 
The curvature « and impact parameter e are given by 
2 26 
pe (6.55) 


; €= : 
„1- 4ôy 14 1-4óy 


Expressions of the uncertainties of the estimated parameters are available and can 
be found in [37]. 
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6.3.4 Riemann Fit 


The Riemann circle fit [38] is based on the fundamental theorem in complex analysis 
that circles and lines in the plane correspond one-to-one to circles on the Riemann 
sphere. Since a circle on a sphere is the intersection of a plane with the sphere, 
there is a one-to-one correspondence between circles and lines in the plane and 
planes in space. The problem of fitting a circle to a set of measurements in the 
plane can, therefore, be transformed into the problem of fitting the transformed 
measurements to a plane in space. The latter problem can be solved directly by 
non-iterative methods. 

The mapping of a point (u;, vj) in the plane to the transformed point (x;, Yi, zi) 
on the Riemann sphere is given by 


xj =w/l+u + vi’), 
yi =u) / + ui? + vi’), (6.56) 


2 2 2 2 
zi = (ui^ + vi )/ + ui^ + vi^). 
The denominator in the expressions of the transformed measurements leads to small 
distances between the transformed measurements and the fitted plane for large 


radii R; = (ui? + vi?) ? in the plane. In an attempt to satisfy the Gauss-Markov 
conditions as closely as possible, a radius-dependent scaling factor was introduced 
in the fitting procedure in [39]. It was realized in [40] that this scaling factor could be 
omitted by mapping the points in the plane to a paraboloid rather than the Riemann 
sphere, 


M= ui, yi =V; Zi =U +v’, (6.57) 


leading to the same values of the estimated parameters if the measurements are at 
fixed radial positions. 

Fitting a plane in space to the n measurements on the paraboloid is tantamount 
to minimizing the objective function 


n 


2 n 492 
Sen 2 Y? (c 4- nix tran tma -75 (6.58) 
i=l i i21 `i 


with respect to c and n = (n1, n5, n3)! with the constraint that n is a unit vector. 
This is achieved by choosing n as the unit eigenvector corresponding to the smallest 
eigenvalue of the sample covariance matrix A of the measurements: 


jus | + 
A= = 2,73 (ri — reg) (ri — reg) : (6.59) 


; Oo: 
i=l L 
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The constant c is equal to c = —nl 


Y^; wir; and the weights 


rcg With the centre of gravity vector rcg — 


wi = m It el. (6.60) 
i > 1/o7 


Given the parameters of the fitted plane, a suitable set of circle parameters can be 
derived. For example, the parameters chosen in Sect. 6.3.3 are given by: 


$ = arctan (2) ; (6.61) 
ni 


2n3 


1 - ni — 4cn3 
i n2 4cn3 J ni 


2n3 


(6.62) 


kK=S: 


(6.63) 


e=s: 


up to a sign s = +1. One possible convention of determining s is given in [41]. 

Expressions of the uncertainties of the estimated circle parameters are given 
in [41] for measurement uncertainties both in the transverse and in the radial 
direction. Effects of multiple Coulomb scattering can also be included in this 
approach, essentially by modifying Eq. (6.59) to include correlations between all 
measurements due to multiple scattering [40]. A robust version of the Riemann fit 
based on LMS regression is proposed in [42]. 


6.3.5 Helix Fitting 


Linearized helix fitting can be done by first estimating the parameters of the circle 
that results from the projection of the helix on the transverse (bending) plane. Any 
of the methods described above can be used for this. If the detector system at hand is 
of a barrel-type, so that the radial positions of the measurements are known to a very 
high precision, the path lengths to the intersections between the fitted circle and the 
detector elements can be obtained from the circle parameters. For the Riemann circle 
fit, these path lengths can be found directly from the knowledge of the parameters 
of the fitted plane [43]. The longitudinal (non-bending) plane parameters can then 
be found by solving the linear regression model: 


z=Apte, A=|::|, (6.64) 
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where z is the vector of z measurements and the parameter vector p is given by: 


a; Z — 
p= i ,) , Varfe]=V,, (6.65) 


where the dip angle A = 2/2 — 0 is the complement of the polar angle 0, and V, is 
the covariance matrix of the measurements (containing contributions from multiple 
scattering, if desired). From the fitted parameters, 0 and z in the innermost layer can 
be immediately obtained. 

In a forward-type detector, the z positions are known very precisely, whereas the 
radial positions of the measurements are observed. In this case, the regression is 5 
on z. A suitable regression model is then: 


lzi 


s=Ap+e, A=|::|, es Var[e] = Vr, (6.66) 


1 Zn 


since the covariance matrix of s is a very good approximation to the covariance 
matrix of R. From the fitted parameters, the polar angle 0 of the track and the s- 
values in all layers can be immediately determined, and, if desired, the predicted 
radial positions of all measurements [43]. If higher precision is needed, the circle 
and line fits can be iterated. 


6.4 Track Quality 


6.4.1 Testing the Track Hypothesis 


The principal test statistic of the track hypothesis, i.e., of the hypothesis that all 
measurements in the track are generated by the same charged particle, is the total 
X? of the track. It is exactly x?-distributed if, and only if, the following conditions 
are met: 


1. the track model is exactly linear; 

2. the measurement errors are normally distributed with mean zero and have the 
correct covariance matrix; 

3. the material effects are normally distributed and have the correct covariance 
matrix; 

4. the estimator is the LS estimator and thus a linear function of the measurements. 


Obviously, these conditions are met very rarely, if ever, in the experiment. In most 
circumstances, the track model is the linear approximation of a non-linear one; the 
measurement errors are not strictly normal, and the calibration of their covariance 
matrix is not perfect; the distribution of the multiple scattering angle has tails that 
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contradict the assumption of normality; the estimated track parameters may be 
distorted by outliers; and the estimator may be a robust version of the usual LS 
estimator. As a consequence, the best one can hope for is that the total x? is at 
least approximately x?-distributed. Its distribution for a sample of tracks can be 
visualized by a histogram of the p-values, defined by: 


p- f PCE (6.67) 
x2 


where gq(x) is the PDF of the x?-distribution with d degrees of freedom. The 
number of degrees of freedom is the sum M of the measurement dimensions m; 
minus the dimension of the track parameter vector. In the ideal case, the p-values 
are uniformly distributed in the interval [0, 1]. In practice, one frequently observes 
a fairly uniform distribution with a peak at zero, where defective and fake tracks 
accumulate. 

Besides the total chi-square statistic, the track length and the number of holes or 
missing measurements is an indication of the track quality. As the number of degrees 
of freedom of the track fit is the same as the number of geometrical constraints 
imposed on the measurements, a long track is much less likely to be a fake track 
than a short track. On the other hand, an outlier has less effect on the total x? in 
a long track than it has in a short track. This can be demonstrated by a simple 
example. Assume a perfect sample of tracks with four measurements of dimension 
two. The fit of five track parameters leaves three degrees of freedom. A x?-cut at 
the 99%-quantile g0.99,3 = 11.345 rejects 1% of the tracks. Assume that one of 
measurements is replaced by an outlier, thereby increasing the total x? of every 
track by 3. Now the same cut rejects 4% of the tracks. Under the same assumptions, 
but with ten measurements and 15 degrees of freedom, the cut rejects only 2.4% of 
the tracks with outliers. 

If the efficiency of the tracking detectors or sensors is known with good precision, 
a rigorous test on the allowed number of holes can be constructed. If, for the 
sake of simplicity, it is assumed that the efficiency € is the same for all sensors 
contributing hits to a track candidate, and that the occurrence of holes is independent 
across sensors, the number h of holes in a track with n measurements is distributed 
according to a binomial distribution: 


P(h) = (n) (1 — ete, (6.68) 


If € = 0.98 and n = 15, then P(1) = 0.23 and P(2) = 0.032, so a single hole is 
not suspicious at all, and two holes are unlikely, but not impossible. If n = 6, then 
P(1) = 0.11 and P(2) = 0.0055, so a single hole is possible, but the occurrence of 
two holes just by chance is very unlikely, in which case the suspicion of a fake track 
or a contaminated track is well-founded. 
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6.4.2 Detection of Outliers 


In the context of track fitting, an outlier is defined as a measurement that does not 
follow the expected behaviour. This may be put into statistical terms by saying that 
a measurement is considered as an outlier whenever its distance from the locally 
estimated track position is too large under the assumption of normal measurement 
errors, the distance being expressed in terms of the covariance matrix attached to 
the measurement. 

Outliers can be classified into track-correlated and track-uncorrelated ones [44]. 
Some sources of track-correlated outliers are: 


* Ambiguous measurements. Some tracking detectors, in particular drift chambers, 
give rise to ambiguous information; see also Sect. 1.2.3. The track search, being 
less restrictive than a rigorous track fit, is not always able to decide which of the 
two possible solutions is the correct one, and the decision must be deferred to the 
track fit. In the track fit, the wrong solution is regarded as an outlier which has to 
be spotted or suppressed. 

* Delta rays. Delta rays are energetic ionization electrons that leave a trail of 
secondary ionization in the detector and can cause a shift in the measured 
position. 

* Cluster merging. In a silicon sensor or in a gaseous detector, two clusters 
belonging to particles that are close in space may merge to a single cluster that is 
biased with respect to both true positions. 

* Cluster decay. Similarly, a large cluster may decay into two clusters, which are 
both biased with respect to the true position. 

* Non-normal measurement errors. Although the bulk of the measurements follows 
a normal distribution in most tracking detectors, there is nearly always a small 
fraction of the data that deviate from the normal law. These data show up as long 
tails in the error distribution and look like outliers. 

e Faulty covariance matrix. The errors attached to the measurement are too small, 
because of insufficient calibration, fluctuations of the signal, wrong assumptions 
about the track angle with respect to the sensor, dead channels, or other detector 
problems. 


Track-uncorrelated outliers are signals that are not caused by the track, but are 
nevertheless picked up by the track search. They may be, for instance, signals from 
adjacent tracks, ghost hits in double-sided silicon sensors, see Sect. 1.3.1, or noise. 

Whatever the source, an outlier can be detected by a test based on the residuals 
of the measurements with respect to the estimated track position. In the case of 
a single outlier, the test is most powerful if the estimate contains the information 
of all the other measurements. This is done most easily in the state space model of 
the track; see Sects. 3.2.3 and 6.1.2. Let mg be measurement under scrutiny, q,,, the 
smoothed residual and C, its covariance matrix, see Eqs. (3.39) and (3.40) and end 
of Sect. 6.1.2. The compatibility of m; with q,,, can be checked component-wise 
on the basis of the standardized residuals, or globally on the basis of the chi-square 
statistic: 
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Nien = (rin) (Rijn) "ipis (6.69) 


If there are no outliers, the standardized residuals should be compatible with a 
standard normal distribution, and x H „ Should be compatible with a x>-distribution 
with m; degrees of freedom, where m; is the dimension of mg. If m; is an outlier, 
this should be visible in the values of r;|„ and Xj us There is, however, a problem 
with this approach. Even a single outlier at position k introduces a bias in all of the 
states qj|,,1 7 k, so that also an inlier at position i # k can show abnormal values 
of the residuals and Xj „, especially if i is close to k. As a consequence, it is by no 
means obvious that m, can be correctly identified as the outlier. The situation is even 
worse if there are several outliers. In this case, a robust track fit that down-weights 
outliers, instead of trying to find and remove them, is a better solution; see Sect. 6.2. 


6.4.3 Kink Finding 


A charged particle decay that produces a single charged daughter particle plus 
some neutral ones manifests itself as a sudden change of the track direction and/or 
curvature, often called a kink or breakpoint.! Typical examples are the muonic 
decays of charged z and K mesons. Another source of kinks is hard elastic 
scattering on the material of the detector [45]. Collinear energy loss of an electron 
by bremsstrahlung does not result in a kink, but only in a change of curvature. 

It is characteristic for a kink that the track segments in front of and behind 
the kink both give a good fit to their respective track model; however, there is 
a significant difference between the two sets of track parameters estimated from 
the two track segments. In the Kalman filter framework, this difference and its 
covariance matrix are readily available at any layer k from the forward and the 
backward filter: 


Ak = qr — lt Var [Ag] = CA = Cc Char (6.70) 
The associated chi-square statistic is given by: 
Xå = AL Cae "Ar. (6.71) 
In [44] a x *-test statistic X? for kink finding is investigated: 


X? = max Ka (6.72) 


!In the literature on time series analysis it is called a change point. 
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where the range K of layers is restricted by the requirement that the respective track 
segments from the forward and the backward filter are well defined. Results from 
the simulation of m and K decays in a simplified setup are given in [44]. 

This simple test does not take into account the specific features of the process 
leading to the kink. In energy loss by bremsstrahlung, only the curvature changes; in 
hard elastic scattering, only the direction changes; in a decay, the direction changes 
and the momentum decreases, or curvature increases. In all three processes, the 
position of the two track segments has to be compatible. These features are taken 
into account by the modified track fit described in [45]. At each possible breakpoint, 
an extended set of track parameters œ is defined that allows for sudden changes in a 
subset of the track parameters. Three cases are considered: 


1. Energy loss by bremsstrahlung. The curvature is allowed to change; therefore, a 
contains two curvature parameters instead of one, for instance, kr and kp. 

2. Hard elastic scattering. Only the direction is allowed to change; therefore, œ 
contains two sets of direction parameters instead of one, for instance tan Ar, dr 
and tan Ap, dp. 

3. Muonic decays of x and K mesons. Direction and curvature are allowed to 
change; therefore, & contains two sets of the corresponding parameters instead 
of one, for instance, tan Ar, dr, kf and tan Ap, dp, Kb. 


At layer k, œ can be estimated by a linear regression in which q, and q h kr Play 
the role of the observations. This is equivalent to the minimization of the following 
objective function: 


So) = (4, - Hie)" Ci (Gy - Hra) + 
: T "m 
+ (Ar 7 Hva) (C) ^ (Eier - Hoe), (6.73) 


where H and H, are the matrices that project œ on q, andq h K+ > Tespectively. 

From the estimated vector w and its covariance matrix, standardized forward- 
backward differences of the relevant parameters can be computed. In addition 
an F-test can be performed to test whether additional parameters result in a 
significant reduction of the total chi-square statistic. The location of the breakpoint 
can be determined by the location of the largest discrepancy between forward and 
backward parameters, as measured by the value of S(œ) at the minimum. Results of 
studies of simulated pion decays in the NOMAD detector are shown in [45]. 

The breakpoint finder in [46] is based on the autocorrelation function of the 
residuals of the track fit. In an undisturbed track with many measurements, typically 
in a TPC, the residuals between the measured coordinates and the fitted trajectory 
are only weakly correlated so that the autocorrelation function of the residuals is 
close to zero for arbitrary lags. A breakpoint in the track introduces correlated shifts 
in all subsequent position measurements, resulting in an autocorrelation function 
that is significantly different from zero. Assuming 1D position measurements, the 
average autocorrelation of lag £ is defined by: 
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n—£ n—£ n—£ -12 


p cb» mel) eu» 5 ; (6.74) 


i=1 ied pel 


where r; = 6j/oj is the residual of measurement i divided by the standard error of 
measurement 7, and n is the total number of measurements. The test statistic A used 
in [46] is a weighted average of the autocorrelations up to a maximal lag L such that 
small lags have larger weight: 


L 


L 
2(L — £) 
A= , = —_—_—, = 1: 6.75 
» 2 we LED Dm (6.75) 


For the simulated data used in [46], setting Z equal to the nearest integer to n/8 gives 
the largest power of the test. In general, the maximal lag L and the weights we must 
be tuned on simulated data. The threshold of A above which the null hypothesis (no 
breakpoint) is rejected, is set according to the tolerated percentage of undisturbed 
tracks that are rejected, i.e., to the probability of an error of the first kind. 
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Part III 
Vertex Reconstruction 


Chapter 7 A 
Vertex Finding m 


Abstract Vertex finding is the search for clusters of tracks that originate at the 
same point in space. The chapter discusses a variety of methods for finding primary 
vertices, first in one and then in three dimensions. Details are given on model-based 
clustering, the EM algorithm and clustering by deterministic annealing in 1D, and 
greedy clustering, iterated estimators, topological vertex finding, and a vertex finder 
based on medical imaging in 3D. 


7.1 Introduction 


Vertex finding is the process of dividing all or a subset of the reconstructed tracks 
in an event into classes such that presumably all tracks in a class are produced 
at the same vertex. Vertices in an event can be classified as primary vertices or 
secondary vertices. In a fixed target experiment, a primary vertex is the point where 
a beam particle collides with a target particle; in a collider experiment, a primary 
vertex is the point where two beam particles collide. In the LHC experiments CMS 
and ATLAS, there are many primary vertices in each event because in each bunch 
crossing many collisions may and do occur. As a rule, however, at most one of these 
primary vertices is of interest to the subsequent analysis; this is called the signal 
vertex. The signal vertex is distinguished from the other primary vertices (pile-up 
vertices) by using kinematic criteria such as the transverse momenta of the tracks 
forming the vertex. 

A secondary vertex is the point where an unstable particle decays, or where a 
particle interacts with the material of the detector. As the search for secondary 
vertices is often based on a well-reconstructed primary vertex, this chapter deals 
only with primary vertex finding; secondary vertex finding is deferred to Chap. 9. 

In collider experiments, the position of the beam spot and the size of the luminous 
region contains precise prior information about the transverse position of primary 
vertices, see Fig. 7.1. The prior information on the position along the beam axis 
is much weaker, as the primary vertices can be anywhere in a zone defined by 
the bunch length, which is a couple of centimeters in the LHC. In fixed-target 
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Fig. 7.1 The transverse size of the luminous region of the LHC as determined by ATLAS in 
2017 [1]. Left: horizontal size; right: vertical size 


experiments, the prior information is given by the beam profile and the position 
and size of the target. 

Vertex finding methods can be roughly divided into three main types: generic 
clustering algorithms, topological methods, and iterated estimators. The latter can 
be considered as a special model-based clustering method. For a brief introduction 
into methods of clustering, see Sect. 3.3. 
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72 Primary Vertex Finding in 1D 


In the LHC experiments ATLAS and CMS, there are many beam-beam interactions 
during a single bunch crossing at the typical luminosity level of the collider. In 
order to find all primary vertices, first primary tracks are selected by a cut on their 
distance to the z-axis, which is the beam line. The selected tracks are then clustered 
on the basis of their z-coordinates at their point of closest approach to the centre of 
the beam spot. The vertex finding problem is thus reduced to clustering in a single 
spatial dimension. 


7.2.1 Divisive Clustering 


A simple divisive clustering can be performed by requiring a gap of at least d 
between adjacent clusters/vertices. The threshold d depends on the shape of the 
beam profile along z, on the expected number of interactions per bunch crossing, 
and on the precision of the z-coordinate used for clustering. It has to be optimized 
by studying simulated events. If subsequent validation of a cluster fails, it can be 
further divided by a more refined method (see Sect. 7.3.3). 


7.2.20 Model-Based Clustering 


Assume that there are n tracks with z-coordinates z;, i = 1,...,n, sorted in 
ascending order: z} < zo < ... < Zn. The z; are assumed to be sampled from 
a Gaussian mixture with the following PDF: 


K K 
f) => exe: ve od Yoo =1, (7.1) 
k=1 k=1 


where K is the number of mixture components, g is the PDF of the normal 
distribution, x is the component weight, v, is the mean value, and ae is the variance 
of component k, k = 1,..., K. As the association of the tracks represented by the 
points z; to the vertices represented by the component means v; is unknown, latent 
(unobserved) variables y;, i = 1,..., are introduced that encode the association: 


yi =k <> zi belongs to component k, i = 1,...,n. (7.2) 


The latent variables and the unknown parameters of the mixture (weights, means, 
variances) can be estimated by the Expectation-Maximization (EM) algorithm [2— 
4]. The EM algorithm is iterative, and each iteration consists of two steps. In the E- 
step (expectation step), the latent variables are estimated, given the observations and 
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the current estimate of the mixture parameters. In the M-step (maximization step), 
the maximum likelihood estimate of the mixture parameters is computed, using the 
estimates of the latent variables from the E-step. Convergence is guaranteed as the 
likelihood increases in every iteration. There is no guarantee, however, to reach the 
global maximum of the likelihood function. 

In the special case of a normal mixture, explicit formulas can be obtained. 
Assume that the M-step of iteration j gives the estimates cx (7). vx (j). or J), k= 
1,..., K. The E-step of the next iteration j + 1 computes the association probabil- 
ities or ownership weights pj x of all points with respect to all components: 


ox G) e Gi; wG), o2 G)) 
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In the M-step the mixture parameters are updated: 
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The EM algorithm suffers from the fact that the number K of clusters has to be 
selected in advance. A possible solution to this problem is to set K to the largest 
value that can reasonably be expected, and to merge components that are sufficiently 
similar after the EM algorithm has converged. 

The selection of the optimal number of components can be automatized by 
sparse model-based clustering [5, 6]. Sparsity of the final mixture is achieved by 
an appropriate prior on the mixture weights. As explicit formulas are no longer 
available, estimation of the mixture parameters has to be done via Markov Chain 
Monte Carlo (MCMC). A comparison with the EM algorithm can be found in [6]. 


7.2.3 EM Algorithm with Deterministic Annealing 


Assume that there are n tracks with z-coordinates z;, i = 1,...,n, again in 
ascending order, with their associated standard errors oj, i = 1,...,n. The EM 
algorithm described in Sect. 7.2.2 is sensitive to the initial values of the component 
parameters. This can be cured by introducing deterministic annealing (DA), see [7]. 
DA introduces a temperature parameter T, which is used to scale the standard errors 
of the zi, i = 1,...,n. Annealing starts at high temperature, corresponding to 
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large errors, and the temperature is lowered according to a predefined annealing 
schedule. At each temperature, the association probabilities of the data points z; 
to the current cluster centers are computed. If there are data points not associated 
to any of the clusters, indicated by very low association probabilities, one of these 
points is chosen as a new cluster center. The number of clusters is thus determined 
dynamically. The algorithm is summarized in Table 7.1. Note that in contrast to 
the model-based clustering in Sect. 7.2.2 the association probabilities are computed 
using the uncertainties of the track positions instead of the vertex positions. 

Two examples with ten randomly generated clusters within a space of 1 cm and 
oj = 0.01 cm are shown in Fig. 7.2. In the example on the top, the found clusters 
perfectly match the true clusters while in the example on the bottom, two true 
clusters merge into a single found cluster, and one of the found clusters is spurious 
with a single data point. 


7.2.4 Clustering by Deterministic Annealing 


Assume further that there are K vertex positions vj, k = 1,...,K, called 
prototypes, that represent K clusters of tracks. The expected discrepancy between 
data points and prototypes is given by: 


Table 7.1 Algorithm: Vertex finding with EM algorithm and deterministic annealing 


CLUSTER FINDING WITH EM ALGORITHM AND DETERMINISTIC ANNEALING 


1. Choose an initial temperature Tọ > 1, a cooling schedule Tọ, ..., Tm < 1, a threshold ô, and 
a robustness constant c € [2, 3]. 

2. Select the smallest data point z; as the first cluster center vj, set K = 1, t = 0, and T = T;. 

3. Compute the association probabilities p; according to: 


exp [-8 (zi — v?/o2] 


Pik = , with B= 1/T. (75) 
exp (-B.c?) + £} exp[-8 Gi — vj)?/0?] 
4. Update the cluster centers: 
n " 2 
vk = Ft ZB ES PIERRE © (7.6) 
i=1 Pi,k 


5. For each data point, compute p; = max, pi, and find the index j of the smallest p;. If 
pj <6,setK:= K +1, vk = zj and go to 3. 

6. Optional: Split “large” clusters by checking their width or their multiplicity. Assign data 
point z; to cluster k, if pj, > 0.5. If cluster k is to be split, set K := K + 1, set vy to the 
smallest data point in the cluster, set vx to the largest data point in the cluster, and go to 3. 

7. Ift « m, sett: t +1, T = T;, and go to 3. 

8. Assign data point z; to cluster k, if pj > 0.9, i = 1,...,n. 
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Fig. 7.2 Examples of cluster finding with EM algorithm and Deterministic Annealing. The results 
are discussed in the text 
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where C; is the cluster represented by vg and d(z;, vg) is a measure of distance 
between data point z; and the prototype v; [8]. A typical choice of d(z;, vg) is the 
weighted squared difference: 


PR 2 
de w = (Ë *) , (1.8) 
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where o; is again the standard error of track position z;. As there is no prior 
knowledge on the probabilities P(z; € Cx), the principle of maximum entropy 
can be applied, giving a Gibbs distribution: 


exp (— d (zi, vr)) 


ae 
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(7.9) 
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The parameter 6 = 1/T is the inverse of the “temperature” of the system. 
Finding the most probable configuration of the prototypes at a given temperature 
is equivalent to minimizing the “free energy” F, which is given by [8]: 


r^ - 5n ep0 B d(zi, v). (7.10) 
iml = 


Minimization of F has to be done by numerical methods; see Sect. 3.1. 

At infinite temperature, i.e., ß = (0, there is a single cluster containing all 
data points. The temperature is now lowered according to a predefined annealing 
schedule. At some positive value of f, the cluster will undergo a phase transition 
and split into smaller clusters. Annealing is continued until the end of the schedule. 
At every temperature, a characteristic number of effective prototypes emerges at 
distinct positions, independent of the number of prototypes. 

Instead of using a large number of prototypes, many of which may coincide at 
a given temperature, one can work with weighted prototypes, where the weight 
py corresponds to the fraction of unweighted prototypes that coincide at vg. The 
weights px always sum to 1, and the free energy is slightly modified: 


n K 
1 
F = -5 hin) px exp CP dCi, m). (1.11) 


i=l k=1 


The association probabilities and the cluster weight are given by: 


px exp (— B d (zi, vy)) = oe " 


: (7.12) 
Yo prexp (—B d (zi, vi)) 


Dik = Pi e Cy) = 


The annealing is started at high temperature with a single prototype of weight o; = 
1. If the distance function is chosen as in Eq. (7.8), the minimum of F is at the 
weighted mean of the data points. During annealing, the temperature is gradually 
decreased and the local minima of F emerge. The critical temperature 77 of cluster 
k is the point where a local minimum turns into a saddle point and is given by: 


TE nn Pi,k : (az) /5* Pi, (7.13) 


Whenever the temperature falls below the critical temperature of a cluster, the 
prototype of this cluster is replaced by two nearby representatives. The association 
probabilities and the new cluster weights are recomputed according to Eq. (7.12). If 
the final temperature is small enough, the soft assignments of data points to clusters 
turns into a hard assignment. 
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7.3 Primary Vertex Finding in 3D 


In principle, the clustering methods described above for vertex finding in 1D can 
also be applied to vertex finding in 3D. It has to be noted, though, that the shortest 
distance in space between two tracks is peculiar insofar as it does not satisfy the 
triangle inequality: if tracks a and b are close, and tracks b and c are close, it does 
not follow that tracks a and c are close as well. The distance between two clusters 
of tracks should therefore be defined as the maximum of the individual pairwise 
distances, known as complete linkage in the clustering literature. Alternatively, the 
distance between two clusters can be the distance between the two vertices fitted 
from the clusters. 


7.3.1 Preclustering 


Because of the high track and vertex multiplicity in typical collider experiments at 
the LHC, preliminary clusters of tracks can be formed by selecting a primary vertex 
or a small group of primary vertices found in 1D. All tracks that are compatible with 
these are put in a preliminary cluster. This reduces the combinatorics and opens the 
possibility of processing these preliminary clusters in parallel. Tracks that are not 
compatible with any primary vertex are reserved for secondary vertex finding. In 
low-multiplicity experiments, this preliminary clustering can be omitted. 


7.3.2 Greedy Clustering 


Greedy clustering is agglomerative and starts with a single track, preferably a high- 
quality track with many hits and good x? (see Sect. 6.4.1). It is combined with its 
nearest neighbour in 3D, and a vertex is fitted from the two tracks. If the fit is 
successful, the vertex is stored. The track nearest to vertex is added, for instance, by 
means of an extended Kalman filter (see Sect. 8.1.2.2). This procedure is continued 
until the vertex fit fails. Clustering is then resumed with an unused track. 

The greedy clustering does not guarantee the globally best assignment of tracks 
to vertices, as tracks that are attached to a vertex remain attached forever. This can 
be cured by using a robust vertex fit throughout (see Sect. 8.2), allowing a track to 
be removed from a vertex if it is tagged as an outlier. 


7.3.3 Iterated Estimators 


This is a divisive clustering algorithm with the following steps: 
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]. Perform a (preferably robust) vertex fit with all tracks. 
2. Discard all incompatible tracks. 
3. Repeat step 1 with all discarded tracks. 


The iteration stops when no vertex with at least two tracks can be successfully fitted. 
Step 2 might itself be iterative, especially if the vertex fit is not robust, so that 
the incompatible tracks have to be removed sequentially. An iterative vertex finder, 
based on an adaptive fit (Sect. 8.2.2) and called the Adaptive Vertex Reconstructor 
(AVR, [9]) is implemented in the RAVE toolbox [10, 11]; see Appendix C. 


7.3.4 Topological Vertex Finder 


A general topological vertex finder called ZVTOP was proposed in [12]. It is related 
to the Radon transform, which is a continuous version of the Hough transform used 
for track finding (see Sect. 5.1.2). The search for vertices is based on a function 
V(v) that quantifies the probability of a vertex at location v. For each track a 
Gaussian probability tube f; (v) is constructed. The function V (v) is defined taking 
into account that the value of f; (v) must be significant for at least two tracks: 


vor Bp Efe 


Due to the second term on the right-hand side, V(v) ^ 0 in regions where fj (v) 
is significant for only one track. The form of V(v) can be modified to fold in 
known physics information about probable vertex locations. For instance, V (v) can 
be augmented by a further function fo(v) describing the location and spread of the 
interaction point. In addition, V(v) may be modified by a factor dependent on the 
angular location of the point v. 

Vertex finding amounts to finding the local maxima of the function V(v). The 
search starts at the calculated maxima of the products f; (v) f; (v) for all track pairs. 
For each of these points, the nearest maximum of V (v) is found. As V(v) is a 
smooth function, any of the methods discussed in Sect. 3.1 can be employed. The 
found maxima are clustered together to form candidate vertex regions. The final 
association of the tracks to the vertex candidates can be done on the basis of the 
respective x? contributions or by an adaptive fit (see Sect. 8.2.2). An experimental 
application is described in [13]. 

In [14], the topological vertex finder was augmented by a procedure based on the 
concept of the minimum spanning tree of a graph. For each track, the bins crossed by 
the track (or a tangent to the track at the point of closest approach) are incremented 
by one. 
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7.3.5 Medical Imaging Vertexer 


The Medical Imaging Vertexer (MIV) [15] is similar to ZVTOP, but differs from 
it in two points: first, it works with a pixelized representation of the track density; 
second, it applies a medical imaging filter to the density before finding the maxima. 
The vertex finder can be summarized in the following steps: 


1. All tracks are back-projected into a volume to be searched for vertices. The 
volume is represented by a 3D histogram. The bin size of the histogram is 
comparable to the tube size in ZVTOP. 

2. The histogram is transformed into Fourier space, filtered by a medical imaging 
filter that removes artifacts and reduces blurring, and transformed back to a 
histogram in 3D space. 

3. The filtered histogram is searched for local maxima. Clustering starts with the 
highest bin. If the next highest bin is adjacent it is added to the cluster, otherwise 
it is the seed for the next cluster. This is iterated until no bin is above a predefined 
threshold. 

4. Clusters are split or merged using a resolution criterion similar to the one used in 
ZVTOP [12]. 

5. A cluster is accepted as a vertex candidate if its starting bin exceeds a predefined 
threshold. The vertex position is estimated as the center of gravity of the cluster. 


The performance of the MIV has been studied and compared to the Adaptive Vertex 
Reconstructor (AVR) in [15]. It is shown that the MIV finds vertices with higher 
efficiency and higher purity at large pile-up, whereas the AVR performs better at 
small pile-up. 
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Chapter 8 A) 
Vertex Fitting m 


Abstract The methods used for vertex fitting are closely related to the ones used 
in track fitting. The chapter describes least-squares estimators as well as robust and 
adaptive estimators. Furthermore, it is shown how the vertex fit can be extended to 
a kinematic fit by imposing additional constraints on the tracks participating in the 
fit. 


8.1 Least-Squares Fitting 


8.1.1 Straight Tracks 
8.1.1.1 Exact Fit 


Assume that there are n straight tracks that have to be fitted to a common vertex. 
Track i is given by a point r;, a unit direction vector a;, and the joint covariance 
matrix V; of q; = (ri;aj), i = l,...,n. The rank of V; is usually equal to five. 
Here and in the entire chapter, it is assumed that there is no material between the 
vertex and the point or surface where the track parameters are defined. 

The estimated common vertex is the point v that minimizes the sum of the 
weighted squared distances from the tracks. The squared distance D; of track i from 
the point v is given by: 


Di(v) = [(ri — v) x a;]?. (8.1) 


For a given vertex vo, the variance o? = var [Di] is computed by linearized error 
propagation. The Jacobian of D; with respect to q; is given by: 
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0 Dj 
an 21 —di.3 Ni 
j ar; 3D; 3 i,2 Ni,21 i,3 Mi,13 
= , zs EEE , 
i aD; ar; i,3 Ni,32 —4i,1 Ni,21 
4i 1 Ni,13 41,2 Ni,32 
da; 
aD; di,3 ni,13—di,2 Ni,21 
=2: | di 1 ni,21—di,3 i,32 
da; 


(8.2) 
di,2 ni,32—di,1 Ni,13 
with the auxiliary variables 


dik —rik — Vok, Ni, jk = di,j dik — Gk dij, j,k = 1,2, 3. 
It follows that 


o? & J] -Vi Ji. (8.3) 
Minimizing the sum of the squared distances gives the fitted vertex 0: 


n 


9 = arg, min S(v), with S(v) = >» au 


; (8.4) 
i=l o? 


The minimization with the Newton—Raphson method proceeds iteratively: 


1. Let vo be an approximate initial vertex position. Compute D;(vo), Ji and o? for 
i=1,...,n, according to Eqs. (8.1)-(8.3). 


2. Compute the gradient of S with respect to v at vo: 


aS 441 aD; , aD; aD; 
VS=— =) > “ , with LS (8.5) 
ðv o v dv or; 
3. Compute the Hessian matrix of S with respect to v at vo: 
> ny d25 + a}; 4,14,2 — G1 4,3 
v’S=) Hi, with H; =2-] —ai 1 ai. 2 a7, +a?, — di2 4i,3 
= -4,14,3 —4i,24;,3 a7; dp, 


(8.6) 
4. Compute the solution v; of VS = 0: 


vı = v0 — (vAs)^! -VS. 


(8.7) 
5. Set vg equal to v; and repeat from step 2 until convergence. 
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The covariance matrix of the final estimate d is given by (V?S)-!, and the x?- 
statistic of the fit is equal to .S(?). Its number of degrees of freedom is the sum 
of the ranks of all V; minus three. Prior information on the vertex position that 
is independent of the track information can be included by an additional term 
in Eq. (8.4) or after the fit by a weighted mean. 


8.1.1.2 Simplified Fit 


If the uncertainty of the direction vectors a; is neglected, the vertex fit can be further 
simplified [1]. Assume that track i is specified by a reference point r; = (xi, yi, zi)! 
in the vicinity of the vertex and a unit direction vector a; in spherical coordinates: 


a; = (cos gj COS Àj, Sin Qj COS Aj, sin AD! ; (8.8) 


where g; is the azimuth and A; = 77/2 — 0; the dip angle, i.e., the complement of the 
polar angle. For the purpose of the vertex fit, a convenient choice of the coordinate 
system for position is a system where the xj-axis is parallel to the track, the y, - 
axis is perpendicular to xj and z, and the z | -axis forms a right-handed orthonormal 
system with xj and y, . The coordinate transformation of the reference point r; to 
this track-based system is given by the following rotation: 


cos o; COSA; sing; cosa; sind; 
/ " 
r; = Riri = — sin gj COS Qj 0 rj. (8.9) 
— cos gj sin À; — sing; sin Àj COS Àj 


The coordinates y, and z are called the transverse and the longitudinal impact 
parameter, respectively. The fit described in [1] assumes that q; = (yi,.z 1)! has 
been estimated by the track fit, with the associated weight matrix G;, and that the 
direction errors are negligible. The transformation from r; to q; is given by the 2 x 3 
matrix T; consisting of the second and third line of R;. 

The vertex v is estimated by minimizing the sum of the weighted distances 
between the reference points and the vertex, transformed to the corresponding track- 
based systems: 


n 
Sw) = Y (ri - v)'T] GiTi(ri — v). (8.10) 
i=l 
The estimated vertex and its covariance matrix C are therefore given by: 
n n -1 
0— C) Wiri, with C= » D and W; — TI GiTi, i=1,...,n. 


i=1 i=l 


(8.11) 
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8.1.2 Curved Tracks 


The fits described in the preceding subsection can also be used with locally straight 
tracks for which the change of curvature and direction in the vicinity of the vertex is 
negligible. If this is not the case, nonlinearities in the track model have to be taken 
into account. 


8.1.2.1 Nonlinear Regression 


The general vertex fit can be formulated as a nonlinear regression model [2]. 
Assume that there are n tracks to be fitted to a common vertex. The tracks 
are specified by the estimated track parameters g; and the associated covariance 
matrices V;,i = 1,...,. The parameters to be estimated are the vertex position v 
and the momentum vectors p; of all tracks at the vertex, see Fig. 8.1. 

The track parameters q; are nonlinear functions of the parameters: 


q; = hiv, pj), i=1...,n. (8.12) 


reference cylinder 


Fig. 8.1 A vertex fit with four tracks. The parameters of the fit are the vertex v and the momentum 
vectors p;; the observations are the estimated track parameters q; 
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The first-order Taylor expansion of h; at a suitable expansion point eg = (vo, P; o) 
gives the following approximate linear model: 


qi X Aiv + Bip; t cji; i=1...,n, (8.13) 
with 
| Fm glas er = hilo pu) Amo Bipi 619 
This can be written as: 
di A; B, O O v a 
A O B5... O|| mi 
Cd PEE ETE (8.15) 
aa) AA 0 0... B) V) V" 
The LS estimates î and p; are obtained by: 
: M741 
Pl) _ M-N =. Mu (8.16) 
b, xa 
with 
Do Dı D2 D, 
Di Eı O O 
M=|P} 0 EB... O|, (8.17) 
DI oo En 
AIG; AIG: ... AlGn 
BIG, 0 O 
N=| 0 BiG... O |, (8.18) 
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Di = Al GiBi, E=B]GB;=W;', G=Vi!, i=1,...,n, 
(8.19) 


n 
Do = Y A} Gi Aj. (8.20) 
i=l 
C = M^! can be written as a block matrix with the blocks Cij, i, j —0,...,n: 


-1 
n 
Coo = (» -J DiW; of) (8.21) 


i-l 
Coj = —CooD;W;, Cjo= Cs j>0 (8.22) 
Cij = Wi; + W; DI CoD; Wj = 6;;W; — Wi D] Coj, i,j >0. (8.23) 


Substitution of Eqs. (8.21)-(8.23) into Eq. (8.16) gives the following expressions for 
the estimated parameters: 


n 

ô = Co Y ATG;Q - BjW;B1Gj) (4; — cj), (8.24) 
j=l 

p; = Wi BIGi(q; — ci - Aid), i=1,...,n. (8.25) 


The functions h; are re-expanded at the new expansion point e = (0, pj), i = 
l,...,n, and the fit is iterated until convergence. After convergence, the track 
parameters q; can be updated: 


ĝi = h; Ô, P;), i= l...,n, (8.26) 


In the linear approximation, the joint covariance matrix of d and all p; is equal to 
C = M^, from which the joint covariance matrix of all ĝ; can be computed by 
linearized error propagation. The x ?-statistic of the fit can be computed as follows: 


n 
x? 2 3a; - d)! Gig; -9). (8.27) 
i=l 


If the errors of the estimated track parameters can be assumed to be approximately 
Gaussian, the chi-square statistic is approximately x?-distributed with 


ndf = > rank(V;) — 3 (n + 1) (8.28) 


i=l 


degrees of freedom. 
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8.1.2.2 Extended Kalman Filter 


The nonlinear regression can be reformulated as an extended Kalman filter 
(see Sect. 6.1.2). Initially, the state vector consists only of the prior information 
about the vertex position vo, and its covariance matrix Co. In many instances, the 
prior information is given by the position and the size of the beam spot or the target. 
If no prior information is available, vo is a rough guess, and Co is set to a large 
diagonal matrix. 

For each track i, i — 1,...,n, the state vector is augmented by the three- 
momentum vector at the vertex p;. The system equation is the identity: 


vj —vi-j, Ci = Cii. (8.29) 


The measurement equation and its linearized form are the same as in Eqs. (8.12)- 
(8.14). The update of the vertex position and the estimation of p; can now be written 
as: 

vi = Ci [Cr vi- + ATP; - e]. (8.30) 


i 
pi = Wi B] Gi (q; — ci — Aivi), (8.31) 


with W; = (BI G; Bj)! and GP = G; — G; Bi W; B] G;. The updated covariance 
and cross-covariance matrices are: 


-1 

Var [v;] = C; = (Gr. T ATGPA;) (8.32) 
Var [p;] = Wi: + Wi B| Gi A; C; A} G; B; Wi, (8.33) 
Cov [v;, p;] = - Ci A} Gi Bi Wi. (8.34) 


Each update step gives rise to residuals r; and a chi-square statistic x > 


ri =q; — hi(vi, pj). (8.35) 
x? =r] Giri + (wi — vi)! C7} (vi — vii). (8.36) 


The chi-square statistic has two degrees of freedom and can be used to test the 
compatibility of track i with the current fitted vertex. If no intermediate results 
are needed, the computation of the momentum vectors p; can be deferred to the 
smoother, and the final vertex v, and its covariance matrix C,, can be computed 
directly, cf. Eqs. (8.21) and (8.24): 


n 
v, — C, in +), AIG (qj; - J , (8.37) 
i=l 
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-1 
n 
Co = G +5 atapa) ; (8.38) 


i=] 


As there is no process noise in the system equation, the smoother is tantamount 
to recomputing the momentum vectors and the covariance matrices with the final 
vertex v, and its covariance matrix C,,, see also Eqs. (8.31)-(8.34): 


Pin = Wi B] Gi (qj — ci — Ain), (8.39) 

Var [Pin] = Wi + Wi B] G;A;C„A/ Gi Bi Wi. (8.40) 

Cov [vn, Pijn] = —Cn Aj Gi Bi Wi. (8.41) 
Cov [pii P jin] = Wi BI Gi AiC, A; Gj B;W . (8.42) 


The update of the track parameters reads: 
ĝi = hi (Yn, Pin) i=1..., n. (8.43) 
Their joint covariance matrix can be computed by linearized error propagation. Each 


track can be tested against the final vertex by computing the smoothed residuals and 
the corresponding chi-square statistic: 


Fijan — Qi — hi(vn, Pijn)» (8.44) 
Xin =T n Girila + (Un — va|- Cpi On — Vaji), (8.45) 


where v„|-; is the final vertex with track i removed, and C, is its covariance 
matrix: 


Uni = Cui [Cy vs - ATGPG: - e]. (8.46) 
-1 
Var [vni] = Cu-i= (C7! ATGBA:) . (8.47) 
Searching for outliers in this way is, however, tedious and time consuming. The 
adaptive vertex fit described in Sect. 8.2 is better suited to and more powerful for 
this task, especially if there are several outliers. 
8.1.2.3 Fit with Perigee Parameters 
In many collider experiments, past and present, the magnetic field in the vicinity of 


the collision region is almost perfectly homogeneous, giving a helical track model. 
The “perigee” parametrization for such helical tracks was introduced in [3], with a 
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Fig. 8.2 A helical track in 
the projection to the 

(x, y)-plane. O: origin; 
C:circle center; P: perigee 
point; V: vertex; p: circle 
radius; tp: tangent at P; pp: 
azimuth of track direction at 
P; ty: tangent at V; gy: 
azimuth of track direction 
at V 
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track 


correction in [4]. The track is parametrized around the point of closest approach, the 
perigee point vP, of the helix to the z-axis, which is also the direction of the beams 
and the magnetic field. The perigee point is expected to be close to the vertex. The 
five track parameters of the perigee parametrization are (see Fig. 8.2): 


1. The impact parameter e. By convention, the sign of e is positive if the origin O 
is at the left side of the trajectory. 


nb Wh 


is anti-clockwise. 


. The azimuth gp of the tangent to the track at P. 

. The z-coordinate zp of the perigee point P. 

. The polar angle 2 of the helix with respect to the z-axis. 

. The signed curvature x. By convention, the sign of « is positive if the trajectory 


With these definitions, the trajectory can be approximately parametrized in terms 
of a running parameter s, which is the distance from P along the projected helix: 


X ^ 


yr 


s2k 


€ Sin gp + 5 COS op — E sin gp, (8.48) 


2 
—€ COS pp + 5 sin gp + = COS gp, (8.49) 


z Z zp + scot 2. 


The track parameters q = (€, gp, 9, Zp, K)! have to be expressed as a function 
of the coordinates v = (xv, yv, zy)! of the vertex V, and the track parameters 
p = (9, pv, x)" at V. Note that 9 and « are invariant along the helix. As higher 
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orders of x can usually be neglected for tracks in collider experiments, the following 
functional dependence is obtained: 


e &« —R — Q?ky2, 


zp © zy — Q(1 — Rk) cot 2, (8.50) 
gp © gy — Ok, 
with 
Q = xy cos oy + yy singy, R= yy cos gy — xy sin oy. (8.51) 


The Jacobian matrix at the lowest order is given by: 


sS —tc | —kc 
[6 ts KS 
oq 0 1 0 
= ; 8.52 
(v, p) 0 ga+ 0 en 
Q —Rt 1 
-Qu2 gm -Q 
with 
c = cosy, s = singy, f = cot. (8.53) 


With these ingredients, the nonlinear regression (see Sect. 8.1.2.1) can be computed. 


8.2 Robust and Adaptive Vertex Fitting 


8.2.1 Vertex Fit with M-Estimator 


Estimators are called robust if they are insensitive to outlying observations [5- 
7]. The M-estimator, see Sect. 6.2.1, is a well-known robust estimator that can be 
implemented as an iterated reweighted LS estimator that assigns smaller weights 
(larger uncertainties) to observations suspected to be outliers. One of the first 
attempts, if not the first, to make the vertex fit robust is the extension of the Kalman 
filter to an M-estimator [8] of the Huber type [5]. Before the fit, the parameters of 
each track are decorrelated by finding the orthogonal matrix U; that transforms the 
covariance matrix V; to a diagonal matrix Di: 


D; = Ui V; U]. (8.54) 
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Table 8.1 Algorithm: Vertex fit with M-estimator 
VERTEX FIT WITH M-ESTIMATOR 


1. Set e = 1076, set the iteration counter J = 0 and set Vi © — Vi, i = l,...,n. Choose a 
robustness constant c, usually a value between 1 and 3. 

2. Compute the LS estimate v/ by one of the estimators in Sect. 8.1, using the covariance 
matrices V;“. For each track i, compute the smoothed residuals rj|n. 

3. For each component rjjn,;, j = 1,...,5, compute a weight w;,; according to: 


D (8.56) 


Wi, į = Max | €, 
"iln. j/ Vij 


where V; jj is the j-th diagonal element of the initial covariance matrix V;, and y(t) is one 
of the functions in Table 6.1 or a similar one from the literature. 

4. Increase the iteration counter / by 1. For each track i and all j, set y = Vi jj/Wi, j. 

5. Check convergence; if necessary, go to step 2. 


The measurement equation Eq. (8.13) is transformed by setting 


V; < Di, q; < Uiqj, Ai <4 U;jA;, Bj < U;B;, ci < Uici. (8.55) 


The M-estimator is then computed by an iterated LS estimator, see Table 8.1. Vertex 
fits using M-estimators with other weight functions are described in [9, 10]. 


8.2.2 Adaptive Vertex Fit with Annealing 


The adaptive vertex fit (AVF) was introduced in [11] and further investigated in [12— 
15]. It can be interpreted either as an EM algorithm or as an M-estimator with a 
specific weight function [16, 17]; see also Table 6.1. In the AVF, tracks as a whole 
are down-weighted, as the weight w; is a function of the distance of track i from the 
current vertex, measured by the chi-square x n 


32. .T 2 exp (-x?/2T) 
POTERIS ) = ; 8.57 
Xi Fi In iFiln, Wi (x exp (-x?/2T) + exp (-x2/2T) i l 


where T is a temperature parameter. The weight w; can be interpreted as the 
probability that track i belongs to the vertex. The chi-square cut x2 sets the threshold 
where the weight is equal to 0.5. Beyond this cut, a track is considered to be an 
outlier rather than an inlier. 

The temperature T modifies the shape of the function in Eq. (8.57). At high 
temperature, the weight function varies very slowly; at low temperature, the weight 
is close to 1 for x < x2 and close to 0 for x2 > x2 (see Fig. 6.1c), with a sharp 
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Table 8.2 Algorithm: Adaptive vertex fit with annealing 


ADAPTIVE VERTEX FIT WITH ANNEALING 


1. Set the iteration counter J = 0 and set V; Q9) — Vi, i=1,...,n. Choose a threshold x2, an 
initial temperature Tọ, and an initial vertex estimate py, 

2. For each track i, compute the smoothed residuals rj), and the x?-statistic x 2 in Eq. (8.57), 
using v“? and V;. Compute the weight w; according to: 


wi = SC) (8.58) 
: exp (-x2/27i) + exp (—x2/2Tk) | ` 


3. Increase the iteration counter J by 1. For each track i, set yv; = Vi/wi. 

4. Compute the LS estimate v” by one of the estimators in Sect.8.1, using the inflated 
covariance matrices V; . 

5. Set the new temperature Ty according to a chosen annealing schedule and go to step 2. 

6. When the final temperature is reached, iterate until convergence. 


drop at x = = 32 The weights at low temperature can be used to identify secondary 
tracks in a fit of the primary vertex; see Sect. 7.3.3. 

Similar to the M-estimator in Sect. 8.2.1, the adaptive vertex fit is implemented 
as an iterated LS estimator, which can be one of the methods described in Sect. 8.1. 
The temperature T can be used to employ an annealing procedure that helps to reach 
the globally optimal solution. The adaptive vertex fit is summarized in Table 8.2. 

In order to get reasonable initial weights, the initial vertex v® has to be chosen 
carefully, preferably with a robust finder [18]. The annealing schedule and the 
constant x2 have to be tuned on simulated data. A detailed study and useful hints 
can be found in [15]. 


8.2.3 Vertex Quality 


The assessment of the quality of a fitted vertex is similar to the assessment of track 
quality; see Sect. 6.4. The primary criterion is the chi-square statistic of the vertex 
fit, or its p-value. If the p-value is unreasonably small, a search for outlying tracks 
can be started. There are several sources of outliers in the vertex fit: 


* Fake tracks and tracks that include unrecognized extraneous measurements or 
noise hits. 

* Tracks with an unrecognized interaction (kink) in the material. 

* Tracks with a covariance matrix that does not properly reflect the statistical 
uncertainty of the track parameters, for instance, because of an incorrect eval- 
uation of material effects. 

* Tracks that belong to a different vertex, in particular to a nearby secondary vertex. 
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Just as in the case of the track fit, outliers can be detected by a test based on the 
residuals of the track with respect to the estimated vertex position. Outlying tracks 
that pass the track quality check are often candidates for inclusion in a secondary 
vertex; see Sects. 7.3.3 and 9.2. 


8.3 Kinematic Fit 


The vertex fit as described so far imposes purely geometrical constraints on the 
participating tracks, namely that they originate at the same point in space. Especially 
in the case of a secondary vertex, the vertex fit can be extended by imposing other 
geometrical or kinematical constraints. In the vertex fit of a photon conversion 
(see Sect. 9.4), the constraint can be imposed that the momentum vectors of the 
outgoing tracks are parallel. In the fit of a decay vertex (see Sects. 9.2 and 9.3), the 
laws of momentum and/or energy conservation can be imposed as constraints. In 
addition, assumptions about the mass of some or all of the participating particles 
can be included. The width of a resonance can be included by considering its mass 
as a virtual measurement with a standard error reflecting the width. Such fits are 
called kinematic fits. 

In a kinematic fit, a track is most conveniently represented by a collection u of 
parameters that are physically meaningful and allow a simple formulation of the 
constraints [19]. If only kinematic constraints are present, u contains the energy and 
the Cartesian momentum components in the global coordinate system: 


u = (E, Px, Py, Pz)". (8.59) 


For charged tracks, u is computed from the usual representation by a five-vector q, 
and the covariance matrix V = Var [u] is obtained by linearized error propagation. 
If the mass of the particle is assumed to be known and fixed, the rank of V is three, 
as the energy is a deterministic function of the momentum p and the mass m. If 
there is an independent measurement of the energy, for instance by a cluster in the 
calorimeter, the rank is four. For neutral tracks, u is computed from the calorimeter 
information. As there is no independent momentum measurement, the rank of V is 
three. 

If in addition a vertex constraint is to be enforced, the location x, y, z in space, 
which can vary along the track, is appended to u, and the rank of V increases by 
two [19]. 

If n tracks with parameters ug and covariance matrices Vy, k = 1,...,n, 
participate in the kinematic fit, their parameters are combined in a single column 
vector y: 


u] 
y=]: |. (8.60) 


Un 
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If the estimated u; are uncorrelated, their joint covariance matrix V is block-dia- 
gonal: 


Var[y] = V = blkdiag(V i, ..., Vn). (8.61) 


If the momentum parts of the u; are the results of a preceding vertex fit, they are 
correlated, and their joint covariance matrix is given in Eqs. (8.40) and (8.42). 

The vector y is considered to be an unbiased observation of the vector « that 
satisfies r « rank(V) kinematical or geometrical constraints: 


y=a+e, E[e]=0, Var[e] = V. (8.62) 


The constraints are expressed by r equations h;(@) = 0, i = 1,...,r, which can 
be written compactly in the following vector form: 


h(a) =0, with h = (hi (œ), ..., h (o)! . (8.63) 


The function h is approximated by its first-order Taylor expansion at the expansion 
point €) = y, resulting in a set of linear constraints: 


h(a) ~ h(ao) + H(a — æo) = h(ao) + Ha — Hao = Ha +d = 0, (8.64) 


with the Jacobian matrix H = dh/da evaluated at « = æo and d = h (æo) — Hao. 
It is assumed that H has rank r in a sufficiently large neighbourhood of y. 

If V has full rank, the constrained LS estimate of œ is obtained by minimizing 
the following objective function: 


S(a, X) = (a — y)! G (æ — y) + 2A! (Ha +d), (8.65) 


where G = V! and is the vector of Lagrange multipliers. Setting the gradient of 
S(a, à) to zero results in the following system of linear equations: 


G (a — y) - HÀ — 0, (8.66) 
Ha +d — 0. (8.67) 


The system can be explicitly solved for «, the solution being the estimate &: 
&—y—VH'Gg(Hy--d), with Gu = (HVH") => (8.68) 
The solution is valid also for singular V as long as r < rank(V), so that HV H T 


has full rank r and can be inverted. This can be proved by regularization of V; see 
Appendix B. 
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The constraints are re-expanded at the new expansion point «o = &, and the fit 
is iterated until convergence. The covariance matrix Vg of the final estimate & and 
the chi-square statistic of the fit are given by: 


V;,=V-VHT'GuHVv, (8.69) 
x? 2 (Hy +d)' Gy (Hy +4). (8.70) 


The x? has r degrees of freedom and can be used to assess the goodness of the 
fit, provided that the linear approximation of the constraints is satisfactory and the 
distribution of the error term e in Eq. (8.62) is close to normal. 

A comprehensive software package for kinematic fitting, called KWFIT, can be 
found in [20]. In addition to stand-alone kinematic fits as described above, it allows 
fiting entire decay chains with multiple vertices. For a discussion of its features 
and of kinematic fitting in general, see [21]. More recent decay chain fitters are 
described in [22] and [23]. 


References 


1. V. Karimäki, Effective Vertex Fitting. Technical Report CMS-NOTE-1997-051, CERN, 
Geneva (1997). http://cdsweb.cern.ch/record/68753 | 

2. P. Billoir, R. Frühwirth, M. Regler, Nucl. Instrum. Meth. Phys. Res. A 241, 115 (1985) 

3. P. Billoir, S. Qian, Nucl. Instrum. Meth. Phys. Res. A 311, 139 (1992) 

4. P. Billoir, S. Qian, Nucl. Instrum. Meth. Phys. Res. A 350, 624 (1994) 

5. PJ. Huber, E.M. Ronchetti, Robust statistics, 2nd edn. (Wiley, Hoboken, 2009) 

6. FR. Hampel, et al., Robust Statistics: The Approach Based on Influence Functions (Wiley, 
Hoboken, 2011) 

7. P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection (Wiley, Hoboken, 1987) 

8. R. Frühwirth, et al., Comput. Phys. Commun. 96, 189 (1996) 

9. G. Agakichiev, et al., Nucl. Instrum. Meth. Phys. Res. A 394, 225 (1997) 

0. M. Kucharczyk, P. Morawski, M. Witek, Primary Vertex Reconstruction at LHCb. Techni- 
cal Report CERN-LHCb-PUB-2014-044, CERN, Geneva (2014). https://cds.cern.ch/record/ 
1756296 

11. W. Waltenberger, Development of vertex finding and vertex fitting algorithms for CMS. Ph.D. 
thesis, TU Wien (2004). http://repositum.tuwien.ac.at/obvutwhs/download/pdf/1562338 

12. J. D’ Hondt, et al., IEEE Trans. Nucl. Sci. 51(5), 2037 (2004) 

13. E. Chabanat, et al., Nucl. Instrum. Meth. Phys. Res. A 549, 188 (2005) 

14. T. Speer, R. Frühwirth, P. Vanlaer, W. Waltenberger, Nucl. Instrum. Meth. Phys. Res. A 566, 
149 (2006) 

15. W. Waltenberger, R. Frühwirth, P. Vanlaer, J. Phys. G 34, N343 (2007) 

16. R. Frühwirth, W. Waltenberger, Austrian J. Stat. 37(3&4), 301 (2008). https://tinyurl.com/ 
RedescendingMestimators 

17. A. Strandlie, R. Frühwirth, Rev. Mod. Phys. 82, 1419 (2010) 

18. R. Friihwirth, et al., in Proceedings of the 13th International Conference on Computing in 
High-Energy and Nuclear Physics (CHEP 2003), vol. eConf C0303241 (2003), p. TULTO13. 
https://www.slac.stanford.edu/econf/C030324 1/proc/papers/TULT013.PDF 

19. P. Avery, Applied Fitting Theory VI — Formulas for Kinematic Fitting. Technical Report CBX 
98-37, CLEO Collaboration (1998). https://tinyurl.com/Avery- KinematicFit 


158 8 Vertex Fitting 


20. P. Avery, KWFIT. https://tinyurl.com/Avery- KWFIT 

21. P. Avery, in Proceedings of the International Conference on Computing in High Energy and 
Nuclear Physics (2000). http://chep2000.pd.infn.it/paper/pap-a207.pdf 

22. W.D. Hulsbergen, Nucl. Instrum. Meth. Phys. Res. A 552(3), 566 (2005) 

23. S.Gorbunov, I. Kisel, Reconstruction of decayed particles based on the Kalman filter. Tech- 
nical Report CBM-SOFT-note-2007-003, STAR experiment, Brookhaven National Laboratory 
(2007). https://tinyurl.com/Gorbunov- KFParticle 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter's Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 9 N 
Secondary Vertex Reconstruction dag 


Abstract The chapter reviews methods for the search for secondary vertices. Four 
types of secondary vertices are discussed in detail: decays of short-lived particles, 
decays of long-lived particles, photon conversions, and hadronic interactions in the 
detector material. 


9.1 Introduction 


A vertex is called a secondary vertex or displaced vertex if it is outside the 
beam profile in a collider experiment or outside the target region in a fixed-target 
experiment. Secondary vertices arise in the following contexts: 


Decays of unstable particles The decaying particle is call the “mother” particle, 
the decay products are called the “daughter” particles. Depending on the lifetime 
and the momentum of the mother particle, the secondary decay vertex is displaced 
by a certain distance from its production point, which can be either a primary or 
itself a secondary vertex. The properties of the mother particles have to be inferred 
from the decay products. Finding decays is important for many types of physics 
analyses as well as for momentum scale calibration, by comparing the reconstructed 
mass of the mother particle to the known one. 


Interactions in the detector material Finding interaction vertices can be useful 
for mapping the amount and position of material in the detector. Two types of 
interactions are important in practice: photon conversions, i.e., pair production of 
electrons and positrons or muons in the electric field of a nucleus; and hadronic 
interactions of primary particles. A decay of a charged particles into a single charged 
daughter particle plus some neutral daughters looks just like a kink in the track. 
Kink finding is the subject of Sect. 6.4.3. The following subsections present some 
methods for finding decay and interaction vertices. 
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9.2 Decays of Short-Lived Particles 


In the present context, short-lived particles are defined as particles that decay before 
they enter the first layer of the innermost tracking device. This includes B (beauty) 
and D (charmed) hadrons as well as t leptons, which travel no more than a few 
millimeters before they decay. 

The key to successful secondary vertex reconstruction is a precise knowledge 
of the primary vertex and the correct selection of the tracks of the decay products. 
In order to make sure that the primary vertex is not contaminated with secondary 
tracks, fake tracks or—at the LHC—pile-up tracks, the fit has to be made robust, 
either by removing outliers (Sect. 8.2.3), or by employing a robust fitting algorithm 
in the first place (Sects. 8.2.1 and 8.2.2). If there is prior information on the beam 
spot, it should be included into the primary vertex fit. The tracks removed from 
the primary vertex or down-weighted as outliers are by definition candidates for 
secondary tracks; see Sect. 7.3.3. 

The selection of the secondary tracks depends on the physics goals of the 
analysis. If a specific decay channel is considered, the tracks can be selected 
according to kinematic criteria, for instance their type, charge, and momentum [1]. 
In addition, the distance of the track from the primary vertex in the transverse 
plane, called the transverse impact parameter d, can be used to verify that the 
selected tracks are not produced at the primary vertex. As the uncertainty of d? 
depends on the angle and the momentum of the track, the test is based on the 
standardized impact parameter s° = d /o [d], also called the significance of the 
impact parameter. 

Secondary tracks can also be selected by their impact parameter alone. To this 
end, the impact parameter of a track is given a sign that is positive if the track 
intersects the trajectory of the decaying hadron or r lepton downstream of the 
primary vertex [1—3]. As the trajectory is not known before the secondary vertex 
fit, it is approximated by the jet axis of the jet the decaying particle is embedded in. 
Only tracks with positive sign are candidates for secondary tracks. 

The information contained in the impact parameter of a secondary track can be 
augmented by considering its functional relation with the azimuth angle of the track 
in the transverse projection, see Fig. 9.1. Let $1,..., Ọm be the azimuth angles and 
dp. T d the transverse impact parameters of m tracks emerging from the same 
secondary vertex. If the primary vertex is very close to the origin of the coordinate 
system, as it usually is, the following functional relation holds approximately: 


dP z £ sind; — bn) © Lli — $n), i — 1... m, (9.1) 


where £ is the decay length of the decaying particle and $y its azimuth angle. Thus 
in the ($, d°)-plane the points corresponding to secondary tracks lie on a straight 
line. This line can be found in various ways, for instance, by a Hough transform 
(Sect. 5.1.2), by histogramming the direction angles of segments connecting two 
points or by agglomerative clustering of the segments [3]. 
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Fig. 9.1 The functional relation between $ and d of secondary tracks 


9.3 Decays of Long-Lived Particles 


Long-lived particles such as neutral K-short mesons K 2 and A baryons decay in 
the tracker volume. Reconstruction of their decays serves as a powerful cross-check 
of the magnetic field map and the alignment, as their mass is very well known, and 
systematic errors in the field map as well as misalignment can lead to a bias in 
the momentum and invariant mass estimation. Candidates for the decay products 
can be identified by a large impact parameter and the fact that tracker hits are 
consistently missing up to a certain layer. In the search for 2-prong neutral decays, 
so-called VOs, pairs of tracks with opposite charge and small distance in space 
are combined to vertex candidates. Without particle identification, the Armenteros— 
Podolansky plot is a useful tool to decide between hypotheses about the mother 
particle from the kinematics of their decay products [4]. It is a scatter plot of the 
longitudinal momentum asymmetry versus the transverse momentum of the positive 
decay product, see Fig.9.2. The figure shows that in some regions of the plot a 
unique decision may not be possible. 

The resolution of eventual ambiguities requires a kinematic fit with mass 
constraints (see Sect. 8.3) based on particle identification of the decay products. For 
example, in the case of K : versus A/ A, charged pions have to be separated from 
protons. 

If the decay vertices are allowed to have more than two charged outgoing 
particles, two-track vertices can serve as seed vertices [5]. A seed vertex is rejected if 
at least one of its tracks has hits between the seed vertex and the primary vertex. The 
seed vertices are then clustered to larger vertices by an agglomerative procedure. 


162 9 Secondary Vertex Reconstruction 


5 pr/GeV 


0.2 


0.15 


0.1 


0.05 


-1 -0.8 -0.6 -0.4 -02 0 0.2 04 06 0.8 1 
(pt- p) /(pt*- p) 


Fig. 9.2. Armenteros-Podolansky plot for K s and A/A. (From [4], with permission by the author) 
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9.4 Photon Conversions 


The reconstruction of photon conversions is an integral part of photon reconstruction 
in general. While the energy of unconverted photons is measured directly in the 
electromagnetic calorimeter (ECAL), a converted photon is reconstructed from the 
charged lepton pair, mostly an e*e” pair, that is produced in the electric field of a 
nucleus. A characteristic feature of the conversion vertex is the fact that the two 
leptons are parallel to each other. As a consequence, the vertex position along 
the photon direction is not very well defined by the vertex fit alone. Two topical 
examples of the reconstruction of converted photons are presented in the following 
paragraphs. 

In the ATLAS experiment (Sect. 1.6.1.2), the first step is the standard elec- 
tron/positron reconstruction, briefly described in Sect. 10.2. Clusters in the ECAL 
are built and used to create regions of interest (ROIs). Inside an ROI track finding 
is modified such that up to 30% energy loss is allowed at each material layer. After 
loosely matching the tracks in the ROIs with the ECAL clusters, tracks with silicon 
hits are refitted with the Gaussian-sum filter (GSF, Sect. 6.2.3). Conversion finding 
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is run on the loosely matched tracks that have a high probability to be electrons or 
positrons according to the transition radiation detector. A conversion may have one 
or two outgoing tracks. Double-track conversions are created when two tracks form 
a vertex that is consistent with the hypothesis that they have been produced from 
a massless particle. Single-track conversions look like tracks without hits before 
a certain layer. If there are several conversions matched to a cluster, double-track 
conversions are preferred over single-track conversions. 

In CMS (Sect. 1.6.1.3) the reconstruction of photon conversions uses the full 
tracking [6]; see also Sect. 10.3. Electrons and positrons in particular are recon- 
structed by associating a track in the silicon tracker with a cluster in the ECAL [7]. 
Besides the standard seeding of tracks in the pixel detector, additional seeds 
are computed by extrapolating the electron/positron from the cluster toward the 
interaction vertex, using both charge hypotheses. Starting from these seeds, track 
candidates are built by the combinatorial Kalman filter (Sect. 5.1.7) and sent to track 
fit with the GSF (Sect. 6.2.3). Electron/positron candidates are then created from the 
association of a GSF track with a cluster in the ECAL. Track pairs of opposite charge 
sign are selected and filtered by various constraints: the two tracks must have small 
angular separation; the trajectories (helices) projected to the transverse plane must 
not intersect; and the presumptive vertex must be inside the tracker volume. Track 
pairs that pass these filters are sent to the vertex fit with the constraint that the tracks 
are parallel at the vertex; see Sect. 8.3. The track pair is kept if the p-value of the 
chi-square statistic indicates a good fit. 
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Knowledge of the material in a tracking device is important for understanding and 
tuning the track reconstruction algorithms. Reconstruction of hadronic interactions 
in the tracker material gives a precise map of the amount and location of active and 
inactive parts of the device. The hadronic interactions result in secondary vertices 
that have to be found. Whereas, in photon conversions, the vertex position along the 
photon direction is ill defined, the vertex position of hadronic interactions can be 
precisely estimated in all directions. 

Tracks from secondary interaction vertices have a large impact parameter and 
may not be found by the standard track finder(s). In order to have the largest possible 
sample of such tracks, a special track finding pass can be implemented. Tracks from 
the primary vertex and short-lived decays are suppressed by a lower bound on the 
transverse impact parameter. Further quality cuts can be applied to select a reliable 
sample of secondary tracks. The following paragraphs describe examples from the 
experiments at the LHC. 

In [8], a vertex finder is described that simultaneously finds all secondary vertices 
in an event in the ATLAS detector (Sect. 1.6.1.2). It starts by finding all possible 
intersections of pairs of selected tracks, performs a vertex fit and applies a quality 
cut. Wrong combinations can be further suppressed by requiring that neither track 
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has hits in the layers with a smaller radius than the vertex. Finally, nearby vertices 
are merged, and incompatibility between vertices sharing tracks are resolved via 
an incompatibility graph; see Sect.5.3. For results on reconstruction efficiency 
and vertex resolution, see [8]. The search for hadronic interaction vertices can be 
complemented by the reconstruction of photon conversions; see Sect. 9.4. 

A precision measurement of the structure of the CMS pixel detector 
(Sect. 1.6.1.3), including sensors and support material, and the beam pipe using 
hadronic interactions is reported in [9]. The track selection requires a minimal 
transverse momentum of 0.2 GeV/c. For all pairs of selected tracks, their distance 
of closest approach is computed; if the distance is below a preset threshold, the 
midpoint of the points of closest approach is kept as a vertex candidate. These 
two-track vertices are then merged to larger vertices by agglomerative clustering, 
(Sect. 3.3.1). These vertices are sent to the adaptive vertex fit (AVF, Sect. 8.2.2). 

The tracks associated to a fitted vertex are classified into incoming, outgoing, 
and merged tracks based on their transverse impact parameter and the number of 
hits before and after the vertex. At least three tracks have to be present, but not more 
than one incoming or merged tracks. Additional filters using the properties of the 
outgoing tracks serve to further reduce the number of fake interaction vertices. For 
results on resolution, efficiency and purity of the vertex reconstruction, see [9]. 

The Vertex Locator (VELO) of the LHCb experiment, see Sect. 1.6.1.4, was 
mapped by the reconstruction of secondary interaction vertices of hadrons produced 
in beam-gas collisions [10]. The data were collected during special runs in which 
helium or neon was injected into the collision region. The tracks used for the recon- 
struction of secondary vertices come from a modified track finding procedure that 
makes no prior assumption about the vertex position. Only well-reconstructed tracks 
with at least three 2D hits are selected. The secondary vertices are reconstructed 
from at least three tracks and have to be of good quality. They also must not be 
compatible with the primary vertex region. Only events with a single secondary 
vertex are used in the analysis. Results and the verification of the procedure with 
photon conversions to two muons can be found in [10]. 
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Part IV 
Case Studies 


Chapter 10 A) 
LHC Experiments m 


Abstract The chapter gives an overview of the track and vertex reconstruction 
methods of the LHC experiments that were used in production during Run 2 of 
the LHC, which ended in autumn of 2018. 


10.1 ALICE 


Track reconstruction in ALICE [1, 2] starts with cluster finding in the two principal 
tracking detectors, i.e., in the pad rows of the TPC and in the silicon sensors of the 
Inner Tracking System (ITS, Sect. 1.6.1.1). The clusters reconstructed in the two 
innermost pixel layers of the ITS are used for a preliminary reconstruction of the 
primary vertex. Track reconstruction takes place through three main passes. 

The first pass starts out in the outer part of the TPC and proceeds inwards through 
the ITS to the vertex region. Track seeds for primary particles are formed from pairs 
of measurements in the outer pad rows of the TPC and the primary vertex. Beginning 
from the seeds, track finding proceeds in the inward direction using a combinatorial 
Kalman filter (see Sect. 5.1.7). 

Each reconstructed TPC track is extrapolated to the outermost layers of the ITS 
and used as a seed for track finding in the ITS. Track following in the ITS uses the 
particle hypothesis computed from the energy loss in the TPC, if available. Seven 
hypotheses are considered: electrons, muons, pions, kaons, protons, deuterons and 
tritons. If the dE/dx information is missing or inconclusive, the pion mass is used. 
An illustration of how a particle hypothesis can be made using a combination 
of dE/dx information (from the TPC measurements) and momentum information 
(from track reconstruction) is shown in Fig. 10.1. 

A difficult point in the first-pass inwards tracking is the transition between the 
TPC and the ITS, due to the quite long (approximately 0.5 m) propagation distance 
between the two tracking systems. Given the uncertainty of the position of the track 
candidate propagated from the TPC and the high density of clusters in the ITS, many 
measurements are potentially compatible with the extrapolation. In order to lower 
the combinatorial complexity, the position coordinates of ITS hits used in finding 
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Fig. 10.1 Ionization energy loss as a function of momentum for a set of particles in the 
ALICE experiment. (From https://arxiv.org/abs/1701.048 10. OCERN for the benefit of the ALICE 
Collaboration. Reproduced under License CC-B Y-4.0) 


primary tracks are augmented by the two angles describing the direction from the 
measurement to the primary vertex, effectively making the hits four-dimensional. 
When all layers of the ITS are traversed, the best surviving candidate (in terms 
of a quality measure using for instance the total x? of the track and the number of 
assigned clusters) of each combinatorial tree is selected. Additional track candidates 
are then found by a stand-alone search in the ITS, using hits not attached to TPC 
tracks. 

During the outward propagation in the second-pass through ITS, TPC and transi- 
tion radiation detector (TRD), the track length and seven time-of-flight hypotheses 
per track (corresponding to the seven particles mentioned above) are calculated. 
This information is subsequently used for particle identification in the time-of-flight 
detector (TOF). Whenever possible, tracks are matched with hits in the TOF and 
other detectors outside the TRD. 

In the third pass towards the vertex region, measurements assigned to the 
surviving track candidates during the first pass are used in the refit. The primary 
vertex is again fitted, now using the full available information from the reconstructed 
tracks as well as the average position and spread of the beam-beam interaction 
region. For a review of the track reconstruction performance, see [2, 3]. 

The high-level trigger (HLT) of ALICE also performs GPU-accelerated track 
finding and fitting in the TPC [4]. A cellular automaton finds track seeds, which 
are then extended by a simplified Kalman filter. After track segments are merged 
to the final track candidates, a full Kalman filter track fit is performed. A detailed 
description of all stages of the HLT can be found in [5]. The current HLT tracking is 
the base for the future TPC tracking in Run 3, which will run in GPUs as well [6]. 
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10.2 ATLAS 


The main track reconstruction strategy in the ATLAS experiment (see Sect. 1.6.1.2) 
is to start out with track seeds in the innermost part of the Inner Detector and proceed 
outwards [7-9]. First clusters are assembled in the pixel and SCT detector sub- 
systems by a connected component analysis (CCA) [10], and from these clusters 
3D measurements (so-called space-points) are created. In dense environments such 
as the core of high-energy jets adjacent clusters from neighbouring particles can 
be so close that they overlap. Identifying such merged clusters correctly during 
the track reconstruction procedure is very important, and ATLAS has developed 
a method using a multi-layer, feed-forward neural network in order to solve this 
task efficiently [11]. 

Track seeds are generated from sets of three space-points. Seeds are processed 
iteratively by first considering sets of space-points from the SCT detector, then 
sets from the pixel detector, and finally sets originating from a combination of the 
two sub-detectors. Starting out from the seeds, track finding is carried out by a 
combinatorial Kalman filter (see Sect. 5.1.7). Each seed can potentially give rise to 
a number of track candidates, as long as the candidates pass a set of quality criteria. 
Efforts to speed up the track reconstruction are ongoing at the time of writing; first 
results based on improving the purity of the seed collection are given in [12]. More 
recent results are described in [13]. 

After iterating over all the seeds, the set of all track candidates are processed 
by an ambiguity solver. A flow diagram of the ATLAS ambiguity solver is shown 
in Fig. 10.2. 

The track candidates are first ranked by a track score given to each candidate. 
The track score definition is intended to give high scores to candidates with a high 
probability of being a real track and therefore depends on, e.g., cluster resolutions, 


Input tracks 


Calculate 
track scores 
and 
Reject tracks 
with bad score 


Order tracks Fit tracks fulfilling 
according to score minimum requirements 

(process from (Neural network used to 
highest to lowest) predict cluster positions) 


Accept track candidate Output tracks 


or 
Rejected Reject track candidate ‚if 
tracks * too many holes 


= too few clusters 

= problematic pixel cluster(s) 
Create or 
stripped-down Recovertrack candidate ‚if 
track candidate = too many shared clusters 
(Neural network used to 


identify merged clusters) 


Fig. 10.2 Flow diagram of the ATLAS ambiguity solver. (From [9], reproduced under License 
CC-BY-4.0) 
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the number of holes on the track and the track x”. Track candidates passing a set 
of basic quality criteria are submitted to a full, high-resolution track fit using all 
available information relevant for an optimal estimation of the track parameters. 
After the track fit, track candidates can either be accepted, rejected or stripped down 
if the candidate contains too many clusters shared by other track candidates. The 
stripped-down candidates are scored again and re-submitted to the entire procedure 
of ambiguity resolving. It can be noted that the full track fit is invoked after the track 
scoring stage in order to save computational load. 

The set of output track segments from the pixel and SCT detectors are submitted 
to the TRT track extension, which extrapolates the track segments through the TRT 
and searches for compatible segments in this drift tube detector. The search and 
subsequent resolution of drift-time ambiguities are done using a line fit in coordinate 
projections making the track model approximately linear [7]. At high levels of pile- 
up the occupancy of the TRT is so high that including drift-time hits not always 
increases track reconstruction performance. However, tube hits (i.e., using only the 
position of the centre straw) in all cases contribute to electron identification through 
information about transition radiation. 

In order to find tracks originating, e.g., from secondary vertices or photon 
conversions, a secondary track reconstruction strategy starts out by finding track 
segments in the TRT using a Hough transform [7]. The TRT segments are then 
back-tracked into the SCT, which allows finding small track segments in the silicon 
detector that were not found during the first inside-out pass. 

Electron reconstruction requires special attention [14] due to the potentially 
large amounts of emitted bremsstrahlung. In ATLAS, there is no separate iteration 
for electron track reconstruction. However, specific handling of bremsstrahlung is 
triggered by electromagnetic showers during the entire track reconstruction chain. 
The combinatorial Kalman filter allows for kinks in the track candidates if they are 
inside a region compatible with an electromagnetic shower. The ambiguity solver 
uses a global LS fit which allows for bremsstrahlung breakpoints in the track model. 
During electron identification, a full Gaussian-sum filter is invoked. 

The primary vertex reconstruction in ATLAS is divided into the two classical 
categories vertex finding and vertex fitting [15]. The input is the set of reconstructed 
tracks in an event selected according to a set of quality criteria. This set is first used 
to obtain a seed position for the primary vertex. The seed position in the bending 
plane is the center of the beam spot, whereas the longitudinal coordinate is defined 
as the mode of the longitudinal position of the tracks at their respective points of 
closest approach to the beam spot. Subsequently, the set of tracks and the seed 
position is submitted to an iterative vertex finding procedure using the adaptive 
vertex fit (AVF) with annealing (Sect. 8.2.2). A typical distribution of track weights 
for a set of values of the temperature parameter T is shown in Fig. 10.3. 

After the final iteration, the tracks with weights so small that they can be 
considered incompatible with the reconstructed vertex are removed from the vertex 
candidate and returned to the pool of unused tracks. The unused tracks are then 
submitted to a new iteration of the vertex finding procedure, which continues 
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Fig. 10.3 Histogram of track weights in the adaptive vertex fit for a set of different temperatures. 
(From [15], reproduced under License CC-B Y-4.0) 


until all tracks have been used or no additional vertex can be found among the 
remaining tracks. For secondary vertex finding inside jets, see [16]. Stand-alone 
vertex reconstruction in the muon spectrometer of ATLAS is described in [17]. 


10.3 CMS 


For a brief description of the CMS tracking system, see Sect. 1.6.1.3. Although 
the pixel detector and the silicon strip detector are mechanically independent sub- 
detectors with different sensor technology, they are considered as a single tracking 
detector as far as track finding and track fitting are concerned. 

The CMS track reconstruction is based on an iterative approach [18-21]. The 
principal difference between iterations is the configuration of the seed generation 
and the target tracks. The iterative search starts with the tracks that are the least 
difficult to find, and proceeds to more difficult classes: low momentum tracks, 
tracks from short-lived decays, and tracks from long-lived decays. In each iteration, 
hits associated to high-quality tracks are masked, reducing the combinatorial 
overhead for the following iteration. Finally, special iterations that improve track 
reconstruction in high-density environments such as jets are executed, followed 
by iterations targeting muon tracks in combination with the muon chambers. For 
a recent comprehensive overview of the tracking performance, see [22]. 

Each iteration consists of four steps: 


1. Seed generation. Initial track segments, called seeds, are found by a combina- 
torial search or by a cellular automaton [23]; see also Sect. 5.1.5. Using the 
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information of three or four hits pixel and/or strips, the trajectory parameters 
and the corresponding uncertainties of the seed are computed. 

2. Track finding. Each seed is used as the starting segment of the combinatorial 
Kalman filter (Sect.5.1.7). At most one missing hit is allowed in a track 
candidate. 

3. Track fitting. A Kalman filter or, for electron candidates, a Gaussian-sum 
filter/smoother (Sect. 6.2.3) is performed to obtain the final estimate of the track 
parameters at the interaction point exploiting the full trajectory information. The 
Kalman filter is also available in a parallelized version [24—26]. 

4. Track classification. Tracks are divided into classes according to different track 
quality criteria; see also Sect. 6.4. 


The twelve tracking iterations used after the pixel upgrade to four layers are listed 
in Table 10.1, showing the seed type and the targeted tracks [20]. Iteration 9 is 
special insofar as it improves track reconstruction in jets and hadronic t lepton 
decays by re-clustering pixel hits, using the jet direction to predict the expected 
cluster shape and charge [27, 28]. Iterations 10 and 11 reconstruct global muon 
tracks. For the combined electron reconstruction with tracker and electromagnetic 
calorimeter, see [29] and Sect. 9.4. 

Primary vertex reconstruction starts with selecting tracks that are produced 
promptly by setting a threshold on their transverse impact parameter [18]. The 
selected tracks are then clustered on the basis of their z-coordinates at their point 
of closest approach to the center of the beam spot, allowing for an arbitrary 


Table 10.1 List of the tracking iterations in CMS used after the Pixel Tracker upgrade with the 
corresponding seeding configuration used and target tracks [20] 


Iteration Step Name Seeding Target Track 


0 Initial pixel quadruplets prompt, high pr 
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number of clusters. The algorithm of choice is clustering by deterministic annealing 
(Sect. 7.2.4). Vertex candidates with at least two tracks are fitted by an adaptive 
vertex fitter (Sect. 8.2.2). Finally the signal vertex is determined as the primary 
vertex with the highest weighted sum of the squared momenta of the jets and isolated 
tracks associated to the vertex. 

Iterations 4 and 5 of the track finding target secondary tracks from short-lived 
decays, for instance B hadrons. Other candidate tracks for inclusion in a secondary 
vertex are tracks rejected by the adaptive vertex fit of the signal vertex. Finding 
secondary vertices is then basically a combinatorial search, followed by a vertex fit 
and possibly a kinematic fit for each candidate. 

Tracks from long-lived decays, for instance K-short mesons and A baryons, are 
the targets of iterations 6-8, each for a different range of the radial distance r of 
the decay from the beam line. Reconstruction of these decay vertices is again a 
combinatorial search followed by a vertex fit and a kinematic fit. 

CMS also has an independent reconstruction of tracks and primary vertices based 
purely on pixel hits. This is significantly faster than the standard track and vertex 
reconstruction chain and therefore a valuable tool for the high-level trigger [18, 30]. 


10.4 LHCb 


Track reconstruction in the LHCb detector uses hits in the VELO, TT and T stations; 
see Sect. 1.6.1.4 and Fig. 10.4. Depending on which detectors are crossed by a 
particle, different track types are defined [31]: 


T-stations 


veto |! 


upstream track long track 


downstream track 
VELO track 


"NN T-track 


Fig. 10.4 Schematic diagram of the LHCb tracking system and the five track types. (From [31], 
reproduced under License CC-BY-4.0) 
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* VELO tracks have hits in the VELO. They are particularly important in the 
primary vertex reconstruction. They can be extended to upstream and long tracks 
during reconstruction. 

* Upstream tracks have hits in the VELO and TT stations. In general their 
momentum is too low to traverse the magnet and reach the T stations. If they 
do, they can be extended to long tracks. 

* Long tracks have hits in both the VELO and the T stations, and optionally in TT. 
They are the most important set of tracks for physics analyses. 

* Downstream tracks have hits in the TT and T stations. They are important for the 
reconstruction of long-lived particles that decay outside the VELO acceptance. 

* T tracks pass only through the T stations. Like the downstream tracks, they are 
useful for particle identification in the Ring Imaging Cherenkov detectors. 


Track reconstruction is done in the two stages of the high-level trigger, HLT1 
and HLT2 (Sect. 2.1.3). In fact, the full event reconstruction runs in the high-level 
trigger [31, 32]. 

The partial track reconstruction in HLTI starts with finding straight lines in the 
VELO, which are fitted by a simplified Kalman filter. The fitted tracks are used 
for the reconstruction of the primary vertices. The primary vertex finding, see [33] 
and [34, Appendix A], proceeds in two steps: seeding and fitting. The 3D seeding 
algorithm starts with a base track and looks for tracks close to it. If at least four close 
tracks are found, a robust average of the points of closest approach of all track pairs 
is computed and stored as a seed. The tracks are marked as used and the next base 
track is processed. The fast seeder clusters tracks according to the z-coordinates of 
their points of closest approach to beam line, similar to the primary vertex finder of 
CMS (see Sect. 10.3). 

Before fitting, the seeds are ordered in descending track multiplicity, so that 
primary tracks are fitted first, and the probability of incorrectly labeling a secondary 
vertex as primary is minimized. The vertex fit is a redescending M-estimator with 
Tukey's biweight function, see Table 6.1 in Sect. 6.2.1. HLT1 proceeds with forming 
upstream tracks by extrapolating VELO tracks to the TT. High-momentum tracks 
are extended to the T stations. The resulting long tracks are fitted with a full Kalman 
filter and sent to the fake track rejection by a neural network [35]. 

HLT2 performs the full track reconstruction, in the sequence shown in Fig. 10.5. 
VELO tracks are extended to long tracks, without a threshold on the transverse 
momentum. Next, a stand-alone track finding is done in the T stations [36]. The 
found tracks are combined with VELO to long tracks. Tracks produced in the decays 
of long-lived particles outside the VELO are reconstructed from track segments in 
the T stations that are extrapolated backwards and combined with hits in the TT. 
The final steps are a full track fit with the Kalman filter, fake track rejection by a 
neural network, and removal of duplicates. 
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Fig. 10.5 Sequence of the full track reconstruction in LHCb. (From [31], reproduced under 
License CC-BY-4.0) 
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Chapter 11 A 
Belle II and CBM Era 


Abstract The chapter gives an overview of the tracking and vertexing algorithms 
of two experiments not at the LHC, Belle II at SuperKEKB in Japan and CBM at 
FAIR in Germany. 


11.1 Belle II 


The Belle II experiment [1] started operation in late 2018. The following overview 
of the reconstruction methods reflects the status after a couple of months of running; 
further developments and adaptations that have been implemented in parallel to the 
rising luminosity of the SuperKEKB collider are documented in [2]. 

The tracking system of Belle II has two principal components, the vertex detector 
(VXD) and the central drift chamber (CDC); see Sect. 1.6.2.1. The VXD consists of 
two parts: the PXD with two layers of DEPFET pixel sensors, and the SVD with 
four layers of double-sided silicon strip sensors. The overall design of the track 
reconstruction is shown in Fig. 11.1. 

Track finding in the CDC starts with a filter, implemented as a boosted decision 
tree, that rejects background hits. This is followed by two track finders. The global 
track finder employs a Legendre transform (Sect. 5.1.4), to find complete tracks 
originating from the interaction region. Peak finding in the Legendre space is done 
by a fast iterative quadtree algorithm [3]. The local track finder builds segments 
from the hits in a superlayer. This is followed by a multi-stage combination and 
filter algorithm that connects valid segments and tracks, rejects fake segments, and 
outputs the final CDC track sample [4]. 

Stand-alone track finding in the SVD [5, 6] is based on a cellular automaton (CA) 
complemented by a sector map (Sect. 5.1.5). The sector map stores prior information 
about which triplets of hits can be part of a track. The vertex position can be used as 
a virtual hit. Track segments that pass all cuts form the cells of the CA. Figure 11.2 
shows the final state of the CA with a toy event, without background. 

When the CA has finished, each track candidate is assigned a quality index, 
which is the p-value of a triplet fit (Sect. 6.1.5), a circle fit, or a Riemann helix 
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Track 
Finding 


Track fit Fitting 


Fig. 11.1 Scheme of the track reconstruction in Belle II. (Courtesy of B. Scavino, Mainz) 


Fig. 11.2 Final state of the 
CA with a toy event in the 
VXD. The colors of the cells 
(track segments) represent 
their states. (Courtesy of J. 
Lettenbichler, Vienna) 


fit (Sect. 6.3). Finally, non-overlapping candidates are selected, either with a greedy 
algorithm or a Hopfield network (Sect. 5.3). At the time of writing, the triplet fit and 
the greedy algorithm are the defaults. 

The merging of CDC and VXD tracks is based on a boosted decision tree that is 
trained on valid combinations of CDC and VXD tracks. It takes the track parameters 
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as determined by a Deterministic Annealing Filter (DAF, Sect. 6.2.2) as input and 
returns a score in the interval (0, 1). Combinations above a threshold are accepted. 
CDC tracks, SVD tracks and combined tracks are extrapolated into the PXD to 
pick up hits by a combinatorial Kalman filter [7], and then passed to the DAF. The 
implementation of the DAF is the one in GENFIT [8-10]. 

The B mesons produced by the decay of the Y (4S) resonance are reconstructed 
hierarchically [11], and their decay chains are fitted by a tree fitter based on [12]. 
VO decays are reconstructed by pairing positive and negative tracks and performing 
a vertex fit [13]. 


11.2 CBM 


CBM is a heavy-ion experiment at FAIR [14], currently in the preparation phase. 
The first beam is expected in 2024. The interaction rate will be up to 10 MHz, 
with up to 1000 charged particles produced in a central collision. Full online event 
reconstruction and selection will be required [15]. This is the task of the First Level 
Event Selection Package (FLES, [16, 17]). The algorithms in FLES exploit vector 
instructions for intra-processor parallelism and parallelism between cores on the 
event level. 

The FLES package consists of several modules: track finding, track fitting, short- 
lived particles finding, event building and event selection [18], see Fig. 11.3. CBM 
runs without a trigger; time-stamped data will be put into a buffer in time slices of 
a certain length instead. The association of tracker hits with events is performed by 
software [18], using the hit time information in the Silicon Tracking System (STS, 
see Sect. 1.6.2.2). The resolution of the hit time is 5 ns. 

Track finding in the STS (Sect. 1.6.2.2) is done by a 4D Cellular Automaton, 
where the fourth dimension is time [19, 20]. The cells of the CA are triplets of hits 
correlating in space and in time. The principle of the CA track finder is illustrated 
in Fig. 5.5 in Sect. 5.1.5. For track finding efficiencies, see [18]. 

The track candidates from the CA are sent to the Kalman Filter Track Fitter. 
The Kalman Filter is “SIMDized”, i.e., uses the Single-Instruction Multiple-Data 
features of the processors employed [21]. The extrapolation in the inhomogeneous 
magnetic field uses the approximate analytical method described in Sect. 4.3.2.2. 

After the primary vertex fit with the Kalman Filter, tracks are sorted into primary 
and secondary tracks. The reconstruction of short-lived particles is done by the 
KFParticle package [22], also based on the Kalman Filter and designed to fit decay 
chains. The reconstruction of a short-lived particle is implemented iteratively. The 
KFParticle package starts with an initial approximation of the secondary vertex, 
adds particles one after another, refines the state vector and gives the optimal values 
after the last particle [16, 17]. Besides the geometrical constraints, mass constraints 
and topological constraints can be imposed on the secondary vertex. 

Track reconstruction in the three stations of the transition radiation detector is 
done by a cellular automaton as well [23]. 
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Fig. 11.3 Flow diagram of the First Level Event Selection Package in CBM. (From [17], 
reproduced under License CC-BY-4.0) 
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Appendix A 
Jacobians of the Parameter 
Transformations 


Transformation from One Curvilinear Frame to Another 


The non-zero terms of the Jacobian matrix are: 
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Transformations Between a Local Frame and the Curvilinear 
Frame 
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where m = t x n and 


(n - k) = (v - k) (n - u) — (u - k) (n: v), 
Qn - j) = (v- j) (n-u)— (u j) (n: v). 


Transformations Between the Intermediate Cartesian Frame 
and the Local Frame 
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Appendix B 
Regularization of the Kinematic Fit 


If V in Eq. (8.61) is not of full rank (singular), it can be regularized by adding a 
positive multiple of the identity matrix: 


V°=V+45I with 5>0. 


The eigenvalues of V? are equal to the non-negative eigenvalues of V plus ô. 
Therefore, they are all positive and V? is regular. The corresponding estimate a 
(see Eq. (8.68)) is equal to: 

a 8 pT 5 

a —y—V'H Gg(Hy-cd), 
with 
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Application of the Woodbury identity [1, Theorem 18.2.8] yields: 
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It follows that 
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Appendix C 
Software 


This appendix presents some software platforms and packages that are not tied to a 
specific experiment and are of particular interest to algorithm developers and users. 


FairRoot 


FairRoot [1] is an object-oriented framework for simulation, reconstruction and 
analysis. It contains base classes for easy definition of the user’s experimental 
setup. Using the concept of Virtual Monte Carlo [2] different simulation programs 
can be deployed without changing the user code or the geometry description. The 
framework delivers classes for detector handling and magnetic field definition, so 
that the user can perform simulations with different detector setups and test different 
reconstruction algorithms. The framework is used by several experiments at FAIR, 
as well as by some groups outside. 


ACTS: A Common Tracking Software 


ACTS [3, 4] encapsulates track reconstruction algorithms in a generic, experiment- 
independent framework. It provides data structures and algorithms for simulation, 
track finding and track fitting, including track parameter propagation. Special 
emphasis is put on parallelization and vectorization of the software. The imple- 
mentation is independent of particular assumptions on the detector technology, the 
detector configuration and the magnetic field. The further development of the project 
depends on active input from the tracking community. ACTS is thus an excellent 
testbed for new ideas on track finding and track fitting. 
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GBL: General Broken Lines 


The general broken lines (GBL) algorithm [5, 6] is a fast track refit based on 
the concept of linearized regression with breakpoints (Sects. 6.1.3 and 6.1.4). The 
track model is implemented in such a way that the model matrix has a very 
special structure and can be inverted in linear time with respect to the number of 
measurements. The GBL fit computes the full covariance matrix of the corrections 
to the track parameters along the track and gives the necessary input to track-based 
alignment with global methods, in particular Millepede II [7]. 


GENFIT 


GENFIT [8-11] is a generic toolkit for track fitting. It provides classes for 
measurements, track representations and fitting algorithms. There are predefined 
measurement classes for various detector types, including planar detectors, drift 
chambers, and time projection chambers. Track representations include code for 
track parametrization and track extrapolation by Runge-Kutta. The available fitters 
are the Kalman filter (with or without reference track), the Deterministic Annealing 
Filter, and the General Broken Lines. There is an interface to the RAVE vertex 
reconstruction toolbox, see below. 

A new and extended version, called GENFIT2, is described in [12]. Studies of its 
performance in the software frameworks of the experiments Belle II at SuperKEKB 
and PANDA at FAIR (Chap. 11) can be found in [12]. 


RAVE 


RAVE [13] is an experiment-independent toolkit for vertex reconstruction. It comes 
with a stand-alone framework VERTIGO for debugging, testing and analyzing. 
The vertex fitters are the extended Kalman filter (Sect. 8.1.2.2) and the adaptive 
vertex fitter (Sect. 8.2.2). Secondary vertex finding by an iterated adaptive fitter 
(AVR, Sect. 7.3.3) is implemented as well. The toolkit can be embedded in the 
software framework of a collider experiment with minimal effort. 
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Glossary and Abbreviations 


ACTS A Common Tracking Software 
ALICE A Large Ion Collider Experiment, an experiment at the LHC 
ATLAS A Toroidal LHC Apparatus, an experiment at the LHC 
AVF Adaptive Vertex Fit 
Belle II An experiment at the SuperKEKB collider 
CBM Compressed Baryon Matter, a future experiment at FAIR 
CA Cellular Automaton 
CDF Collider Detector at Fermilab, an experiment at the Tevatron 
CDF Cumulative Distribution Function 
CERN European Laboratory for Particle Physics in Geneva 
CKF Combinatorial Kalman Filter 
CMS Compact Muon Solenoid, an experiment at the LHC 
DAF Deterministic Annealing Filter 
FAIR Facility for Antiproton and Ion Research, an accelerator complex 
at GSI 
GBL General Broken Lines 
GNN Graph Neural Network 
GSF Gaussian-Sum Filter 
GSI German federal research institute for heavy ion research in Darm- 
stadt 
HL-HLC High-luminosity stage of the LHC, starting in 2026 
HLT High-level trigger 
KEK The Japanese National Laboratory for High-Energy Physics 
LHC Large Hadron Collider, the large proton-proton collider at CERN 
LHCb Large Hadron Collider beauty, an experiment at the LHC 
LS Least Squares 
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LSTM Long Short-Term memory 
PDF Probability Density Function 
RNN Recurrent Neural Network 


SuperKEKB Electron-positron collider (B-factory) at KEK 


Index 


A 
A common tracking software (ACTS), 91, 193 
A large ion collider experiment (ALICE), 12, 
91, 169 
tracking detectors, 12 
Alignment, 10 
track based, 10, 11, 109 
Artificial retina, 84 
A toroidal LHC apparatus (ATLAS), 10, 13, 
94,171 
fast tracker, 96 
hadronic interaction, 163 
photon conversion, 162 
tracking detectors, 13 
AVR, see Vertex finding 


B 
Belle II, 17, 181 

tracking detectors, 17 
Breakpoint, 107, 124 
Bremsstrahlung, see Material effects 


C 
CA, see Cellular automaton 
Case studies 
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CMS, 173 
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Cellular automaton (CA), 87, 88, 170, 173, 
181, 183 
Central tracker, 11, 13, 169, 173, 183 
Chi-square statistic, 27, 28, 39, 40, 42, 93, 97, 
98, 105, 110, 117, 118, 121, 124, 
125, 148, 149, 153, 154, 157, 163 
Circle fit, 116 
Chernov and Ososkov's, 117 
conformal mapping, 116 
Karimäki’s, 117 
Riemann fit, 119 
CKF, see Kalman filter, combinatorial 
Clustering, 29, 44, 116, 132 
agglomerative, 44, 138, 140, 160, 
161, 164 
deterministic annealing, 135, 175 
divisive, 44, 133, 138 
greedy, 138 
hierarchical, 44 
model-based, 45, 133 
partitional, 44 
Combinatorial Kalman filter, see Kalman filter, 
combinatorial 
Compact muon solenoid (CMS), 13, 24, 94, 
173 
hadronic interaction, 164 
photon conversion, 163 
tracking detectors, 13 
track trigger, 97 
Compressed baryon matter (CBM), 
17, 183 
tracking detectors, 17 
Conformal transformation, 81, 116 
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D 
DAF, see Deterministic annealing filter (DAF) 
Deterministic annealing, 90,99, 111, 112, 134, 
135, 154 
Deterministic annealing filter (DAF), 112, 183, 
194 
Drift chamber, 4-6, 17, 96, 112, 181, 194 
cylindrical, 5 
drift tubes, 6 
planar, 4 


E 
EM algorithm, 133, 134 
with deterministic annealing, 134 
Energy loss, see Material effects 
Equation of motion, 49 
Error propagation, 43, 57 
homogeneous magnetic field, 59 
inhomogeneous magnetic field, 63 
Event reconstruction, 23 
physics objects reconstruction, 28 
track quality, 28 
track reconstruction, 26 
global, 27 
hit generation, 27 
local, 27 
trigger and data acquisition, 23 
vertex reconstruction, 28 
Extended Kalman filter, see Kalman filter, 


extended 

F 

Facility for antiproton and ion research (FAIR), 
13,183 


CBM experiment, 17, 183 
FairRoot, 193 
Function minimization, 33 
descent methods, 34 
gradient-free methods 
simplex algorithm, 37 
gradient methods 
conjugate gradients, 36 
line search, 34 
quasi-Newton methods, 35 
steepest descent, 35 
Newton-Raphson method, 33 


G 

Gaussian-sum filter, 78, 114-116 
GBL, see General broken lines (GBL) 
General broken lines (GBL), 109, 194 
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GENFIT, 109, 112, 183, 194 
GNN, see Neural network 


H 
Helix fit, 120 
High-luminosity LHC, 11,95, 97 
Hit generation 
drift chamber, 5, 6 
MWPC, 4 
pixel sensor, 8 
silicon strip sensor, 8 
TPC, 7 
Hough transform, 82, 98, 172 


I 
Inner tracker, see Central tracker 


K 
Kalman filter, 41, 42, 92, 124, 183, 194 
backward, 43, 107, 124 
combinatorial, 92—94, 98, 107, 115, 163, 
169, 171, 174, 183 
extended, 43, 106, 149 
chi-square statistic, 149 
gain matrix, 42 
information filter, 43 
linear 
chi-square statistic, 42 
residuals, 42 
smoother, 41—43 
residuals, 43 
square-root filter, 43 
Kinematic fit, 155 
Kink finding, 28, 124 


L 
Large hadron collider beauty (LHCb), 15, 25, 
175 

hadronic interaction, 164 
tracking detectors, 15 

Legendre transform, 85, 181 

LHC experiments, 12, 131, 169 
ALICE, 12, 91, 169 
ATLAS, 10, 13,94, 171 
CMS, 13, 24, 94, 173 
LHCb, 15,25,175 


M 

Magnetic field, 49 
homogeneous, 50, 53, 59 
inhomogeneous, 50, 54, 63 


Index 


Material effects, 67 
bremsstrahlung, 77, 114, 125 
approximation by normal mixture, 78 
Bethe-Heitler model, 77 
distribution, 77 
energy loss, 76 
Bethe-Bloch formula, 76 
by bremsstrahlung, 77 
by ionization, 76 
in track propagation, 76 
multiple scattering, 67, 104, 105 
angular distribution, 68 
approximation by normal mixture, 70 
covariance matrix, 104, 105 
Highland formula, 68 
single scattering angle, 67 
single scattering variance, 68 
in track propagation, 71 
thick scatterer, 74 
thin scatterer, 71 
M-estimator, 110, 152, 153 
adaptive weights, 111 
Huber’s weights, 111 
Tukey’s bisquare weights, 111, 176 
weight function, 111 
Micro-pattern gas detector, 7 
Multiple scattering, see Material effects 
Multi-wire proportional chamber (MWPC), 4 
Muon tracking system, 11, 13, 14, 17, 24, 26, 
71 
MWPC, see Multi-wire proportional chamber 
(MWPC) 


N 

Neural network, 89 
graph, 91 
Hopfield, 89, 99, 182 
recurrent, 91 


(0) 

Outlier, 27, 105, 152-154 
chi-square statistic, 123 
detection of, 28, 105, 123, 155 
track-correlated, 123 
track-uncorrelated, 123 


P 

Particle identification, 29 
Cherenkov detectors, 29 
ionization, 29 
particle flow, 30 
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tracking and calorimetry, 29 
transition radiation, 29 
Pattern matching, 94, 96-98 
Pattern recognition, see Track finding 
Perigee parameters, 150 
Pile-up, 23, 30, 131, 140, 160, 172 
Primary vertex, see Vertex 


R 
RAVE, 139, 194 
Regression 
linear, 39 
chi-square statistic, 39 
nonlinear, 40, 103, 146 
chi-square statistic, 40, 105, 148 
residuals, 39, 105 
robust, 110 
standardized residuals, 39, 105 
with breakpoints, 107, 108 
RNN, see Neural network 
Runge-Kutta, see Track propagation 


S 
Secondary vertex, see Vertex 
Seed 
track, 92, 93, 163, 169, 171, 174 
vertex, 161, 172, 176 
Sensor 
pixel, 8 
silicon strip, 8 
Signal vertex, see Vertex 
Smoother, see Kalman filter 
State space model, 41, 106, 123 
linear, 41 
nonlinear, 43 
SuperKEKB, 17, 181 
Belle II experiment, 17, 181 


T 
Template matching, see Pattern matching 
Time projection chamber (TPC), 6, 13, 169, 
194 
TPC, see Time projection chamber (TPC) 
Track candidate selection, 97-99 
Track finding, 81 
artificial retina, 84 
cellular automaton, 87 
combinatorial Kalman filter, 92 
conformal transformation, 81 
graph neural network, 91 
Hopfield network, 89 
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Track finding (cont.) test of track hypothesis, 28, 121 
Hough transform, 82 track length, 122 
Legendre transform, 85 Track reconstruction, 47 
online, 96 in ALICE, 169 
pattern matching, 94 in ATLAS, 171 
recurrent neural network, 91 in Belle II, 181 
track following, 92 in CBM, 183 

Track fit, 103 in CMS, 173 
affine transformation, 96, 98, 109 in LHCb, 176 
deterministic annealing filter, 112 Trigger, 23 
extended Kalman filter, 106 CMS, 24 
Gaussian-sum filter, 114 high-level, 25 
general broken lines, 109 low-level, 24 
least-squares regression, 103 LHCb, 25 
M-estimator, 110 high-level, 26 
regression with breakpoints, 107 low-level, 26 
robust, 27, 110 Triplet fit, 109, 181 


triplet fit, 109 
Track following, 92 
Tracking detector, 3 


alignment, 10 V 
calibration, 4—8, 28 Vertex, 28 
drift chamber, 4—6, 17, 96, 112, 181, 194 primary, 28, 131, 154 
gaseous, 4 secondary, 28, 131 
multi-wire proportional chamber, 4 signal, 28, 131, 175 
pixel, 8, 93, 164, 169 Vertex detector, 11, 13, 16, 17, 95, 96, 169, 
scintillating fiber, 9 171,181 
semiconductor, 7 Vertex finding, 28, 131 
silicon drift, 9 in 1D, 133 
silicon strip, 8, 13, 16, 17,97 deterministic annealing, 135 
time projection chamber, 6, 13, 169 divisive clustering, 133 
Tracking system, 11 EM algorithm, 134 
Track model, 49 model-based clustering, 133 
Track parametrization, 50 in 3D, 138 
barrel-type, 50 adaptive vertex reconstructor, 139, 140 
curvilinear, 52 greedy clustering, 138 
planar, 51 iterated estimators, 138, 154 
transformation medical imaging, 140 
between curvilinear and local frames, preclustering, 138 
61, 189 topological, 139 
between curvilinear frames, 60, 187 secondary vertex, 159 
between global Cartesian and local decay vertex, 159 
frames, 62, 190 hadronic interaction, 163 
Track propagation, 43, 52 interaction vertex, 159 
analytical, 55, 183 long-lived particle, 161 
homogeneous magnetic field, 53 photon conversion, 162 
inhomogeneous magnetic field, 54 short-lived particle, 160 
Runge-Kutta-Nystróm method, 50, 54, 194 Vertex fit, 28, 143 
Track quality, 28, 121 adaptive, 139, 153, 172, 175 
chi-square statistic, 121 chi-square statistic, 154 
detection of outliers, 123 curved tracks, 146 
kink finding, 124 chi-square statistic, 148, 149 
chi-square statistic, 124 extended Kalman filter, 149 


number of holes, 122 nonlinear regression, 146 


Index 


perigee parameters, 150 
kinematic fit, 155, 157 
chi-square statistic, 157 
quality, 154 
robust, 152 
M-estimator, 152 
straight tracks, 143 
chi-square statistic, 145 
exact fit, 143 
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Karimäki’s fit, 145 

Vertex quality, 154 
Vertex reconstruction, 129 

in ALICE, 169 

in ATLAS, 172 

in Belle II, 183 

in CBM, 183 

in CMS, 174 

in LHCb, 176 


